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SUBTILISIN VARIANTS CAPABLE OF CLEAVING SUBSTRATES 
CONTAINING BASIC RESIDUES 



FIELD OF THE INVENTION 
This invention relates to subtilisin variants having altered specificity from wild-type subtilisins. 
5 Specifically, the subtilisin variants are modified so that they efficiently and selectively cleave substrates 
containing basic residues. The invention further relates to the DNA encoding these novel polypeptides, as well 
as the recombinant materials and methods for producing these subtilisin variants. In a particular aspect, the 
present invention provides for processes for cleaving protein substrates containing basic residues. 

BACKGROUND OF THE INVENTION 

X 0 She-specific proteolysis is one of the most common forms of post-translational modifications of proteins 

(for review see Neurath, H. (1989) Trends Biochem. Sci. f 14:268). In addition, proteolysis of fusion proteins in 
vitro is an important research and commercial tool (for reviews see Uhlen, M. and Moks, T. (1990) Methods 
Enzymol., 185:129-143; Carter, P. (1990) m Protein Purification: From Molecular Mechanisms to Large-Scale 
Processes, M.R. Landisch, R.C. Wilson. CD. Painton, S.E. Builder, Eds. (ACS Symposium Series 427, 

15 American Chemical Society, Washington, D.C.), Chap. 13, p.181-193; and Nilsson, B. et al. (1992) Current 
Opin. Struct. Biol., 2:569). Expressing a protein of interest as a fusion protein facilitates purification when the 
fusion contains an affinity domain such as glutathione-S-transferase, Protein A or a poly-htstidine tail. The 
fusion domain can also facilitate high level expression and/or secretion. 

To liberate the protein product from the fusion domain requires selective and efficient cleavage of the 

20 fusion protein. Both chemical and enzymatic methods have been proposed (see references above). Enzymatic 
methods are generally preferred as they tend to be more specific and can be performed under mild conditions 
that avoid denaturation or unwanted chemical side-reactions. A number of natural and even designed enzymes 
have been applied for she-specific proteolysis. Although some are generally more useful than others (Forsberg. 
C, Baastrup, B., Rondahl, H., Holmgren, E. p Pohl, G., Hanmanis, M. and Lake, M. (1992) J. Prot. Chem.. 

25 1 1:201-21 1), no one is applicable to every situation given the sequence requirements of the fusion protein 
junction and the possible existence of protease sequences within the desired protein product. Thus, an expanded 
array of sequence specific proteases, analogous to restriction endonucleases, would make she-specific proteolysis 
a more widely used method for processing fusion proteins or generating prouttn/peptide fragments either in vitro 
or in vivo. 

3 0 The processing of prohormones by the KEX2 -related family of serine endoproteases illustrates one of 

the most precise proteolytic events found in nature (for reviews see Steiner, D. F„Smeekens, S. P., Ohagi. S. and 
Chan, S. J.(1992)J. Biol. Chem., 267,23435-23438 and Smeekens, S. P. (1993) Bio/Technology 1 1. 182-186). 
This family of proteases, that includes the yeast K£X2 and the mammalian PC2, PC3 and furin enzymes, are 
homologous to the bacterial serine protease subtilisin (Kraut, J. (1977) Annu. Rev. Biochem.., 46:33 1-358). 

35 Subtilisin has a broad substrate specificity that reflects its role as a scavenger protease. In contrast, these 
eukaryotic enzymes are very specific for cleaving substrates containing two basic residues and thus well-suited 
for site-specific proteolysis. 
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All of these eucaryotic enzymes strongly require Arg at the P I position, and either Arg. Lys or Pro at 
the P2 position of peptide substrates. The prohormone convenases from higher eukaryotes such as furin. PC2. 
and Pa also have an absolute requirement for Arg at the P4 position (Bresnahan. P. A.. Leduc, R.. Thomas. U 
Thomer. J., Gibson. H. U Brake, A. J., Barr. P. J. and Thomas. G. (1 990) J. Cell. Biol. 1 1 1 . 285 1 : Wise. R. J.. 
5 Baar. P. J., Wong. P. A, Kiefer. M. C. Brake. A. J., and Kaufinan. R. J. (1990) Proc. Natl. Acad. Sci. USA 87. 
9378-9382.; Hosaka. M^Nagahama, M. Kim. W.-S.. Watanabe. T„ Hatsuzakawa, K.. Ikemizu. J.. Murakami. 
IC andNakayaroa. K. (1991)J. Biol. Chem.266. l2127-I2130.;Matthews. D. J, Goodman. L. J.. Gorman. C. 
M.. and Wells. J. A. (1994) Protein Science 3. 1 197-1205). 

Despite the very narrow specificity of the pro-hormone processing enzymes, in some cases they are 
10 capable of rapid cleavage of target sequences. For example, the k^/Km ratio for KEX2 to cleave a good 
substrate (e.g. acetyl-pMYRK-MCA) is 1 .lxlO 1 mV 1 (Brenner. C. and Fuller. R.S. (1992) Proc. Natl. Acad. 
Sci. USA . 89:922-926) compared to 3x1 0 5 for subtilisin cleaving a good substrate (e.g. suc-AAPF-pNA) (Estell. 
D. A.. Graycar. T. P.. Miller. J. V.. Powers, D. B„ Bumier. J. P.. Ng. P. G. and Wells. J.A. (1986) Science. 
233:659-663). 

15 However, the eukaryotic proteases are expressed in small amounts (Bravo. D. B.. Gleason. J. B.. 

Sanchez. R. I.. Roth, R. A., and Fuller. R. S. (1994) J. Biol. Chem.. 269:25830-25837 and Matthews. D. J., 
Goodman. L. J.. Gorman, C. M.. and Wells. J. A. (1994) Protein Science . 3:1 197-1205) making them 
impractical to apply presently to processing of fusion proteins in vitro. Subtilisin BPN" however, can be 
expressed in large amounts (Wells. J.A.. Ferrari, E.. Henner, DJ.. Estell, DA. and Chen. E.Y. (1983) Nucl. 
20 Acids Res.. 11:7911-7929) 

Extensive protein engineering studies of subulism. and especially subtilisin BPN'. have identified 
several residues in the SI and S2 active she of the enzyme where amino acid substitutions lead to large changes 
in substrate specificity (Wells. J. A., and Estell. DA.. (1988) Trends Biochem. Sci.. 13:291-297; Carter. P.. et 
aL. (1989) PROTEINS:Structure. Function, and Genetics. 6:240-248). X-ray crystal structures of subtilisin 
25 containing bound transition state analogues (Wright, C. S.. Alden. R. A. and Kraut, J. (1 969) Nature . 22 1 :235- 
242; McPhalen. CA. and James. N.G. (1988) Biochemistry. 27:6582-6598; Bode. W.. Papamokos. E.. Musil. 
D.. Seemueller. U. and Fritz. M. (1986) EMBO J„ 5:813-818; and Bon. R.. Uhsch. M.. Kossiakoff. A.. Graycar. 
TI. Katz, B. and Power. S. (1988) J. Biol. Qiem, 263:7895-7906) can be used to locate active site residues that 
are in close proximity to side chains at key positions in substrate peptides (Wells, J.A.. (1987) Proc. Natl. Acad. 
3 0 Sci. USA 84: 121 9- 1223). Consideration of electrostatic interactions between charged peptide substrates and 
subtilisin can be used to tailor the substrate binding cleft of the subtilisin BPN to favor complementary charged 
substrates (Wells, JA.. et al., (1987) Proc. Natl. Acad. Sci, USA, 84:1219-1223). Previous work has shown that 
replacement of residues at position 156 and 166 in the SI binding site of subtilisin BPN' with various charged 
residues leads to improved specificity for complementary charged substrates. 
35 A substantial amount of protein engineering has been applied to the specificity determinants of the S4 

subsite of subtilisin BPN in efforts to alter specificity for P4 substrates (Eder. J., Rheinnecker, M.. and Fersht. 
A. R. (1993) FEBS Lett 335, 349-352; Rheinnecker. M., Baker. G.. Eder. J., and Fersht. A. R. (1993) 
Biochemistry 32, 1 199-1203; Rheinnecker. M.. Eder. J.Jandey. P.S.. and Fersht. A. R. (1994) Biochemistry 33. 
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221-225). However, the mutations introduced consisted entirely of hydrophobic substitutions, thus preserving 
the overall hydrophobic substrate preference in the site. 

Previous attempts to introduce, remove or reverse charge specificity in enzyme active sites have been 
met with considerable difficulty. This has generally been attributed to a lack of stabilization of the introduced 

5 charge or enzyme-substrate ion pair complex by the wild-type enzyme environment (Hwang, J.K. and Warshel. 
A. (1988) Nature , 334270-272). For example, Stcnnicke ex a! (Stennicke, H.R.; Ujje, H.M.; Christensen, U.; 
Remington, SJ.; and Breddam (1994) Prot. Eng. 7:91 1-916) made acidic (D/E) mutations at five residues in 
the PI' binding of carboxypeptidase Yin an attempt to change the PI' preference from Phe to Lys/Arg. Only the 
L272D and L272E mutations were found to alter the specificity in the desired direction, up to 1 3-fold preference 

10 in Lys/Arg over Phe, and the others simply resulted in less active enzymes having substrate preferences similar 
to wild-type. In the case of trypsin, a protease that is highly specific for basic PI residues, recruitment of 
chymotrypsin-like (hydrophobic PI) specificity required not only mutations of the ion pair-forming Asp 1 89 to 
Ser, but also transplantation of two more distant surface loops from chymotrypsin (Graf, L- Jancso. A., Szilagyi, 
L., Hegyi, G., Pinter, K., Naray-Szabo, G.. Hepp, J., Medzihradszky, and Rutter, W. J., Proc. Natl. Acad. 

15 Sci. USA (1988) 85:4961-4965 and Hedstrom, L. Szilagyi, U and Rutter, W. J., Science (1992) 255:1249- 
1253). 

In the present work, we have also verified that relatively low specificity is gained by introducing single 
ion-pairs between enzyme and substrate. However, when two or more choice ionic interactions were 
simukaneously engineered into subtilisin BPK, the resulting variants had higher specificity for basic residues 
20 in each of the subsites due to a non additive effect 

Accordingly, it is an object to produce a subtilisin variant with basic specificity for use in processing 
pro-proteins made by recombinant techniques. 

SUMMARY OF THE INVENTION 

The present invention provides for subtilisin variants with altered substrate specificity. Preferred 
25 subtilisin variants are highly specific for the efficient cleavage of substrates containing basic residues. The 
subtilisin variants have a substrate specificity which is substantially different from the substrate specificity of the 
precursor subtilisin from which the amino acid sequence of the variant is derived. The amino acid sequence of 
the subtilisin variants are derived by the substitution of one or more amino acids of a precursor subtilisin amino 
acid sequence. 

30 In a preferred aspect of the present invention, the subtilisin variants of the present invention are specific 

for the cleavage of protein substrates containing basic amino acid residues at substrate positions Ph P2 and P4. 
According to this aspect of the present invention subtilisin variants having amino acid substitutions at positions 
corresponding to amino acid positions 62, 104 and 166 of subtilisin BPN' produced by Bacillus 
amyloliquefacicns are preferred. Accordingly, subtilisin variants are provided wherein amino acids 62. 104 and 

35 166 of subtilisin BPK are substituted with an acidic amino acids. Preferably the acidic amino acid is Asp or Glu. 
and most preferably Asp. 
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Preferred substrates for the subtilism variants according to this aspect of the present invention contain 
either Lys (K) or Arg (R) at substrate positions P2 and PI, practically any residue at P3, and preferably either 
Lys or Arg at P4, and again practically any residue at P5. Thus an exemplary good substrate would contain -Asn- 
Arg-Met-Arg-Lys- (SEQ ID NO: 76) at -P5-P4-P3-P2-PJ- respectively. Additionally, good substrates would 
5 not have Pro at Pl\ P2\ or P3' nor would He be present at Pl\ 

According to a second aspect of the present invention the subtilisin variants are capable of cleaving 
protein substrates having basic residues at positions PI and P2. According to this aspect of the present invention 
subtilism variants having ammo acid substitutions at positions corresponding to amino acid positions 62. and 166 
of subtilism BPN' produced by Bacillus amyloliquefociens are preferred. The preferred subtilisin variants having 

10 substrate specificity for dibasic substrates have an acidic amino acid residue at residue position 62 of subtilisin 
naturally produced by Bacillus amyloliquefociens. In a preferred embodiment, the naturally occurring Asn at 
residue position 62 of subtilism BPhT is preferably substituted with an acidic amino acid residue such as Glu or 
Asp, and most preferably Asp. The preferred subtilism variants, having substrate specificity for substrates having 
dibasic amino acid residues, additionally have an acidic residue. Asp or Glu, at residue position 1 66 of subtilisin 

15 BPN*. Thus, the subtilisin BPN' variant containing substitution of amino acids 62 and 166 with acidic amino 
acids Glu or Asp are preferred. In particular, a subtilisin variant having amino acid Asp at positions 62 and 166 
is preferred (subtilisin BPK variant N62D/G166D). The subtilisin variants according to this aspect of the 
invention may be used to cleave substrates containing dibasic residues such as fusion proteins with dibasic 
substrate linkers and processing hormones or otheT proteins (in vitro or in vivo) that contain dibasic cleavage 

20 sites. 

Preferred substrates for the subtilisin BPK variant N62D/G166D contain either Lys (K) or Arg (R) at 
substrate positions P2 and PI, practically any residue at P3, a non-charged hydrophobic residue at P4, and again 
practically any residue at P5. Thus an exemplary good substrate would contain -Asn-Leu-Met-Arg-Lys-(SEQ 
ID NO: 35) at -P5-P4-P3-P2-P1- respectively. Additionally, good substrates would not have Pro at Pl\ P2\ or 
25 P3* nor would lie be present at P 1 \ 

The invention also includes mutant DNA sequences encoding such subtilisin variants. These mutant 
DN A sequences are derived from a precursor DNA sequence which encodes a naturally occurring or recombinant 
precursor subtilisin. The mutant DNA sequence is derived by modifying the precursor DNA sequence to encode 
the substhution(s) of one or more amino acids encoded by the precursor DNA sequence. These recombinant 
3 0 DNA sequences encode mutants having an ammo acid sequence which does not exist in nature and a substrate 
specificity which is substantially different from the substrate specificity of the precursor subtilisin encoded by 
the precursor DNA sequence. 

Further the invention includes expression vectors containing such mutant DNA sequences as well as host 
cells transformed with such vectors which are capable of expressing the subtilisin variants. 
35 The invention also provides for a process for cleaving a polypeptide such as a fusion protein containing 

a substrate linker represented by the formula: 
P4-P3-P2-P1 

wherein P4 is a basic amino acid or a large hydrophobic amino acid such as Leu or Met; P3 is an amino acid 
selected from the naturally occurring amino acids; P2 is a basic amino acid; and PI is a basic amino acid. The 



4- 



WO 96/27671 PCT/US96/02861 

process includes the step of subjecting the polypeptide to the subtilisin variants described herein under conditions 
such that the subtilisin variant cleaves the polypeptide. 



BRIEF DESCRIPTION OF THE FIGURES 
Figure 1. Structure of a succinyl-Ala-Ala-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
5 site of subtilisin BPN* showing the S2 and SI binding pocket residues subjected to mutagenesis. 

Figure 2. Kinetic analysis of SI binding site subtilisin mutants versus substrates having variable PI 
residues. The kinetic constant k^/Km was determined from plots of initial rates versus substrate concentration 
for the te uapep t i de series succinyl-Ala-Ala-Pro-Xaa-pNa (SEQ ID NO: 69), were Xaa was Lys (SEQ ID NO: 
58), Arg (SEQ ID NO: 59), Phe (SEQ ID NO: 56). Met (SEQ ID NO: 60) or Gin (SEQ ID NO: 61 ) (defined to 
10 the right of the plot). 

Figure 3. Kinetic analysis of S2 binding site subtilisin mutants versus substrates having variable P2 
residues. The kinetic constant k XM /Km was determined from plots of initial rates versus substrate concentration 
for the tetrapeptide series succinyl-Ala-Ala-Xaa-Phe-pNa (SEQ ID NO: 70), were Xaa was Lys(SEQ ID NO: 
62), Arg (SEQ ID NO: 64), Ala (SEQ ID NO: 63), Pro (SEQ ID NO: 56). or Asp (SEQ ID NO: 65) (defined on 
15 the right of the plot). 

Figure 4. Kinetic analysis of combined SI and 52 binding sits subtilisin mutants versus substrates having 
variable PI and P2 residues. The kinetic constants k^/Km were determined from plots of initial rates versus 
substrate concentration for the tetrapeptide series succiny]-Ala-Ala-Xaa 2 -Xaa,-pNa (SEQ ID NO: 7 1 ), were 
Xaa r Xaa, was Lys-Lys (SEQ ID NO: 66), Lys-Arg (SEQ ID NO: 67). Lys-Phe(SEQ ID NO: 62). Pro-Lys (SEQ 
20 ID NO: 58), Pro-Phe (SEQ ID NO: 56), or Ala-Phe (SEQ ID NO: 63) (defined on the right of the plot). 

Figures. Results of hGH-AP fusion protein assay. hGH-AP fusion proteins were constructed, bound to 
hGHbp-coupled resin, and treated whh 0.5 nM N62D/GI66D subtilisin in 20 mM Tris-Cl pH 82, Aliquots were 
withdrawn at various times and AP release was monitored by activiry assay in comparison to a standard curve. 
Arrows indicate the cleavage she. The rate of cleavage of fusion proteins containing various substate linkers is 
25 shown. Substrates containing a Pro at position PI* are not cleaved. 

Figure 6-1 - 6-10. (Collectively referred to herein as Fig. 6). DNA sequence of the phagcmid pSS5 
containing the N62D/G166D double mutant subtilisin BPN gene (SEQ ID NO: I), and translated amino acid 
sequence for the mutant preprosubtilisin (SEQ ID NO: 2). The pre region is comprised of residues - 1 07 to -78, 
the pro of residues -77 to -1, and the mature enzyme of residues +1 to +275 (SEQ ID NO: 72). Also shown are 
30 restriction sites recognized by cndonucleases that require 6 or more specific bases in succession. 

Figure 7. Structure of a succinyl-Ala-Ala-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
site of subtilisin BPN showing the SI. S2, and S4 binding pocket residues subjected to mutagenesis. 

Figure 8. DNA sequence of the N62D/Y104D/G166D triple mutant (SEQ ID NO:74) as well as the 
translated amino acid sequence (SEQ ID NO:75). The preregion is comprised of residues -107 to -78. the pro 
35 residues -77 to -1 and the mature enzyme +\ to+275. The proregion re fleets the changes, A(-4)R/A(-2 )K/Y(- 1 )R 
made in the wild-type processing site to affect expression. 
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DETAILED DESCRIPTION OF THE INVENTION 

Definitions 

Terms used in the claims and specification are defined as set forth below unless otherwise specified. 
The term ammo acid or amino acid residue, as used herein, refers to naturally occurring L amino acids or 
5 residues, unless otherwise specifically indicated. The commonly used one- and three-letter abbreviations for 
amino acids are use herein (Lehnmger, A, U Biochemistry, 2d ed„ pp. 71-92, Worth Publishers. N. Y. (1975)). 
Basic amino acids are Arg ana* Lys. Acidic amino acids are Asp and Glu. 

Substrates are described in triplet or single letter code as Pn..P2-Pl-Pr-P2'».Pn'. The "PI" residue refers 
to the position proceeding (i.e., N-terminal to) the scissile peptide bond (Le. between the PI and PI' residues) 
10 of the substrate as defined by Schechter and Bcrger (Schechter. I. and Berger, A., Biochem. Biophys. Res. 
Commun. 27: 157-162 (1967)). Similarly, the term PI' is used to refer to the posirion following (i.e., C-terminal 
to) the scissile peptide bond of the substrate. Increasing numbers refer to the next consecutive position preceding 
(e.g., P2 and P3) and following (e.g., P2' and P3 1 ) the scissile bond. According to the present invention the 
scissile peptide bond is that bond that is cleaved by the subtilisin variants of the instant invention. 
15 "Subtilisins/ "precursor subtilisin" and the like are bacterial carbonyl hydrolases which generally act to 

cleave peptide bonds of proteins or peptides. As used herein, "subtilisin" means a naturally occurring subtilisin 
or a recombinant subtilisin. A scries of naturally occurring subtilisins are known to be produced and often 
secreted by various bacterial species (Siexen, RJ., et al., (1991) Protein Engineering 4:719-737). Amino acid 
sequences of the members of this series are not entirely homologous. However, the subtilisins in this series 
2 0 exhibit the same or similar type of proteolytic activity. This class of serine proteases shares a common amino 
acid sequence defining a catalytic triad which distinguishes them from the chymotrypsin related class of serine 
proteases. The subtilisins and chymotrypsin related serine proteases both have a catalytic triad comprising 
aspartate, histidme and serine. In the subtilisin related proteases the relative order of these amino acids, reading 
from the amino to carboxy terminus is aspartate-histidine-serine. In the chymotrypsin related proteases the 
25 relative order, however is histidine-aspartate-serine. Thus, subtilisins as used herein refer to a serine protease 
having the catalytic triad of subtilisin related proteases. 

Generally, subtilisins are serine endoproteases' having molecular weights of about 27,500 which are 
secreted in large amounts from a wide variety of Bacillus species. The protein sequence of subtilisins have been 
determined from at least four different species of Bacillus (Markland, F.S.. et ai ( 1 97 1 ) in The Eruymes. ed. 
30 Boyer P.D., Acad Press, New York, Vol. Ill, pp. 561-608; and Nedkov, P. et al. (1983) Hoppe-Seyler's Z. 
Physiol. Chem. 364:1537-1540). The three-dimensional crystallographic structure of four subtilisins hz*>z been 
reported (BPN' from Bacillus amyloliquefacicns, Hirono et aL (1984) J. Mol. Biol. 178:389-413; subtilisn 
Carlesberg from Bacillus iicheniformis. Bode et al., (1986) EMBO J., 5:813-818; thermitase from 
Thermoactinomyces vulgaris, Gros et al., (1989) J. Mol. Biol. 210:347-367; and proteinase K from Tritirachium 
35 album. Betzel. et aL, (1988) Acta Crystallogr., B. 44:163-172). The three dimensional structure of subtilisin 
BPK (from B. amyloiiquefaciens) to 2.5 A resolution has also been reported by Wright, C.S. et al ( 1 969) Nature 
221:235-242 and Drenth, J. et al. (1972) Eur. J. Biochem. 26:177-181. These studies indicate that although 
subtilisin is genetically unrelated to the mammalian serine proteases, h has a similar fold and active site structure. 
The x-ray crystal structures of subtilisin containing covalently bound peptide inhibitors (Robertus. J.D.. et al. 
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(1972) Biochemistry 1 13439-2449). product complexes (Roberrus. J.D.. et al. (1972) Biochemistry 1 1 :4293- 
4303). and transition state analogs (Matthewsi D.A.. e, al. (1975) / Biol. Chem. 250.7120-7126 and Poulos. 
Tl^etal. (1976) J. Biol. Chem. 251:1097-1 103). which have been reported have also provided information 
regarding the active lite and putative substrate binding cleft of subnlisins. In addition, a large number of kinetic 
5 and chemical modification studies have been reported for subtilisins (Phillip. M.. et al. (1983) Mol. Cell. 
Biochem. 51:5-32; Svendsen. LB. (1976) Carlsberg Res. Comm. 4 1337-291 and Markland. F.S. Id.) as well 
as at least one report wherein the side chain of methione at residue 222 of subtilisin was converted by hydrogen 
peroxide to methionine-sulfoxide (Stauffer. D.C.. ei al. (1965)/ Biol. Chem. 244 5333-5338). 

"Subtilisin variant," "subtilisin mutant" and the like refer to a subtilisin-type serine protease having a 
10 sequence which is not found in nature that is derived from a precursor subtilisin according to the present 
invention. The subtilisin variant has a substrate specificity different from the precursor subtilisin by virtue of 
amino acid substitutions within the precursor subtilisin amino acid sequence. The term is meant to include 
subtilisin variants in which the DNA sequence encoding the precursor subtilisin is modified to produce a mutant 
DNA sequence which encodes the substitution of one or more amino acids in the naturally occurring subtilisin 
15 amino acid sequence. Suitable methods to produce such modification include those disclosed in U. S. Patent No. 
4.760.025 and 5 .371,008 and in EPO Publication No. 0130756 and 025 1446. 

A change in substrate specificity is defined as a difference between the K^/Km rario of the precursor 
subtilisin and the subtilisin variant. The KJKm ratio is a measure of catalytic efficiency. Subtilisin variants 
with increased or decreased KJKm ratios compared to the precursor subtilisin from which they were derived 
20 are described herein. Generally, the objective is to secure a variant having a greater, i.e. numerically larger. 
KJKm ratio for a given substrate. A greater K^/Km ratio for a particular substrate indicates that the variant 
may be used to more efficiently cleave the target substrate. 

The specificity or discrimination between two or more competing substrates is determined by the ratios 
of k^/Km (Fersht, A.R.. (1985) in f Stmenm- tmA Mechanism. W.F. Freeman and Co.. N.Y. p. 1 12). An 
25 increase in Y^/Km ratio for one substrate may be accompanied by a reduction in K^/Km ratio for another 
substrate. This shift in substrate specificity indicates that the variant subtilisin with the increased Y^JKm ratio 
for the substrate has utility in cleaving the particular substrate over the precursor subtilisin in. for example, 
preventing undesirable hydrolysis of a particular substrate in a mixture of substrates. 

In general, for a subulisin variant to have a useful catalytic efficiency for cleavage of a particular substrate 
30 the KJKm ratio will generally be between 1 x 10 5 mV to about 1 x 10 7 MV. More often, the K„/Km ratio 
will be between about 1 x 10 4 M"V and I x 10* mV. 

When referring to mutants or variants, the wild type amino acid residue is followed by the residue number 
and the new or substituted amino acid residue. For example, substitution of D for wild type N in residue position 
62 is denominated N62D. 

35 "Subtilisin variants or mutants" are designated in the same manner by using the single lener amino acid 

code for the wild-type residue followed by its position and the single lener amino acid code or the replacement 
residue. Multiple mutants are indicated by component single mutants separated by slashes. Thus the subtilisin 
BPN- variant N62D/G 1 66D is a di-substituted variant in which Asp replaces Asn and Gly at residue positions 
62 and 1 66. respectively, in wild-type subtilisin BPK. 
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An amino acid residue of* a precursor carbonyl hydrolase is "equivalent" to a residue of B. 
amyloliquefaciens subtilisin if h is either homologous (ie., corresponding in position in either primary or tertiary 
structure) or analogous to a specific residue or portion of that residue in B. amyloliquefaciens subtilisin (i.e., 
having the same or similar functional capacity to combine, react, or interact chemically). 
5 In order to establish homology to primary structure, the amino acid sequence of a precursor carbonyl 

hydrolase is directly c o mpared to the B. amyloliquefaciens subtilisin primary sequence and particularly to a set 
of residues known to be invariant in all subtiiistns for which the sequences arc known (see eg. Figure 5-C in EPO 
0251446). After aligning the conserved residues, allowing for necessary insertions and deletions in order to 
rnaflrtain alignment (it, avoiding the elimination of conserved residues through arbitrary deletion and insertion), 

10 the residues equivalent to particular ammo acids in the primary sequence of B. amyloliquefaciens subtilisin are 
defined Alignment of conserved residues should conserve 100% of such residues. However, alignment of 
greater than 75% or as little as 50% of conserved residues is also adequate to define equivalent residues. 
Conservation of the catalytic triad, Asp32/His64/Ser221, is required. 

Equivalent residues homologous at the level of tertiary structure for a precursor carbonyl hydrolase whose 

15 tertiary structure has been determined by x-ray crystallography, are defined as those for which the atomic 
coordinates of 2 or more of the main chain atoms of a particular amino acid residue of the precursor carbonyl 
hydrolase and B. amyloliquefaciens subtilisin (N on N, CA on CA, C on C, and O on O) are within 0.13nm and 
preferably 0.1 nm after alignment. Alignment is achieved after the best model has been oriented and positioned 
to give the maximum overlap of atomic coordinates of non-hydrogen protein atoms of the carbonyl hydrolase 

20 in question to the B. amyloliquefaciens subtilisin. The best model is the crystallographic model giving the lowest 
R factor for experimental diffraction data at the highest resolution available. 

l\Fo(h)\-\Fc(h)\ 
h 



25 R factor = ■ 



l\Fo(h)\ 
h 



Equivalent amino acid residues of subtilisin BPN\ subtilisin Carslberg, thermitase and proteinase K from tertiary 
structure analysis is provided in, for example, Siexen, et ah, (1991) Prot. Eng. 4:719-737. 

30 Equivalent residues which are functionally analogous to a specific residue of B. amyloliquefaciens 

subtilisin are defined as those amino acids of the precursor carbonyl hydrolases which may adopt a conformation 
such that they either alter, modify or contribute to protein structure, substrate binding or catalysis in a manner 
defined and attributed to a specific residue of the 5. amyloliquefaciens subtilisin as described herein. Further, 
they are those residues of the precursor carbonyl hydrolase (for which a tertiary structure has been obtained by 

35 x-ray crystallography), which occupy an analogous position to the extent that although the main chain atoms of 
the given residue may not satisfy the criteria of equivalence on the basis of occupying a homologous position, 
the atomic coordinates of at least two of the side chain atoms of the residue lie within 0.1 3nm of the 
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corresponding side chain atoms of B. amyloliquefaciens subtilisin. The three dimensional structures would be 
aligned as outlined above. 

Some of the residues identified for substitution are conserved residues whereas others are not In the case 
of residues which are not conserved, the replacement of one or more amino acids is limited to substitutions which 
5 produce a mutant which has an amino acid sequence that docs not correspond to one found in nature. In the case 
of conserved residues, such replacements should not result in a naturally occurring sequence. The subtilisin 
mutants of the present invention include the mature forms of subtilisin mutants as well as the pro- and prepro- 
forms of such subtilisin mutants. The prepro-forms are the preferred construction since this facilitates the 
expression, secretion and maturation of the subtilisin mutants. 
10 "Prosequence" refers to a sequence of ammo acids bound to the N -terminal portion of the mature form of 

a subtilisin which when removed results tn the appearance of the "mature" form of the subtilisin. Many 
proteolytic enzymes art found in nature as translational proenzyme products and, in the absence of post- 
translational processing, art expressed in this fashion. The preferred prosequence for producing subtilisin 
mutants, specifically subtilisin BPN' mutants, is the putative prosequence of B. amyloliquefacicns subtilisin 
15 although other subtilisin prosequences may be used. For example, when the substrate specificity of the precursor 
subtilisin is altered according to the present invention, this alteration may affect the ability of the variant 
subtilisin to undergo autolyric cleavage of the naturally occurring prosequence. In order to affect the expression 
and proper folding of a mature variant subtilisin whose substrate specificity has been altered, it may be necessary 
to alter the prosequence to correspond to the new or variant substrate specificity. 
20 As an example, the substrate specificity of a particular subtilisin variant N62D/Y104D/G166D is distinct 

from the precursor subtilistn from which h was derived The subtilisin variant prefers substrates containing basic 
residues at substrate positions corresponding to P4, P2, and PI . According to this aspect of the present invention, 
the precursoT prosequence which was efficiently autolysed by the precursor subtilisin is altered to correspond 
to the substrate specificity of the variant subtilisin. Therefore, for the subtilisin variant N62D/Y 1 04/G 1 66D the 
25 prosequence would be altered to contain basic residues at positions -4, *2, and *1. 

A "signal sequence" or "presequence" refers to any sequence of amino acids bound to the N -terminal 
portion of a subtilisin or to the N -terminal portion of a prosubtilisin which may participate in the secretion of the 
mature or pro forms of the subtilisin. This definition of signal sequence is a functional one, meant to include all 
those amino acid sequences, encoded by the N -terminal portion of the subtilisin gene or other sect e table carbonyl 
3 0 hydrolases, which participate in the effectuation of the secretion of subtilisin or other carbonyl hydrolases under 
native conditions. The present invention utilizes such sequences to effect the secretion of the subtilisin mutants 
as defined herein. 

A " p re p ro" form of a subtilisin mutant consists of the mature form of the subtilisin having a prosequence 
opeTabry linked to the ammo-terminus of the subtilisin and a "pre" or "signal" sequence operably linked to the 
3 5 ammo terminus of the prosequence. 

-Expression vector" refers to a DNA construct containing a DNA sequence which is operably linked to 
a suitable control sequence capable of effecting the expression of the DNA in a suitable host. Such control 
sequences include a promoter to effect transcription, an optional operator sequence to control such transcription, 
a sequence encoding suitable mRNA ribosome binding sites, and sequences which control termination of 
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transcription and translation. The vector may be a plasmid, a phage particle, or simply a potential genomic insert. 
Once transformed into a suitable host, the vector may replicate and function independently of the host genome, 
or may, in some instances, integrate into the genome itself. In the present specification, "plasmid** and "vector- 
are sometimes used interchangeably as the plasmid is the most commonly used form of vector at present. 
5 However, the invention is intended to include such other forms of expression vectors which serve equivalent 
functions and which are, or become, known in the art. 

The "host cells" used.in the present invention generally are procaryotic or eucaryotic hosts which 
preferably have been manipulated by the methods disclosed in EPO Publication No. 0 1 30756 or 025 1 446 or U.S. 
Patent No. 5371,008 to render them incapable of secreting enzymancally active endoprotease. A preferred host 
10 cell for expressing subtilisin is the Bacillus strain BG2036 which is deficient in enxymatically active neutral 
protease and alkaline protease (subtilisin). The construction of strain BG2036 is described in detail in EPO 
Publication No. 0130756 and further described by Yang, M.Y„e/c/. (19S4) J. Bacteriol. 160:15-21. Such host 
cells arc distinguishable from those disclosed in PCT Publication No. 03949 wherein enzymatically inactive 
mutants of intracellular proteases in £. coli are disclosed. Other host cells for expressing subtilisin include 
15 Bacillus svbtilis var. 1168 (EPO Publication No. 0130756). 

Host cells are transformed or transfected with vectors constructed using recombinant DN A techniques. 
Such transformed host cells are capable of either replicating vectors encoding the subtilisin mutants or expressing 
the desired subtilisin mutant In the case of vectors which encode the pre or prepro form of the subtilisin mutant, 
such mutants, when expressed, are typically secreted from the host cell into the host cell medium. 
2 o "Operabry linked" when describing the relationship between two DN A regions simply means that they are 

functionally related to each other. For example, a presequence is operably linked to a peptide if it functions as 
a signal sequence, participating in the secretion of the mature form of the protein most probably involving 
cleavage of the signal sequence. A promoter is operably linked to a coding sequence if it controls the 
transcription of the sequence; a ribosome binding site is operably linked to a coding sequence if it is positioned 
25 so as to permit translation. 

The genes encoding the nanirally-occurring precursor subtilisin may be obtained in accord with the general 
methods described in U.S. Patent No. 4,760,025 or EPO Publication No. 0130756. As can be seen from the 
examples disclosed therein, the methods generally comprise synthesizing labeled probes having putative 
sequences encoding regions of the hydrolase of interest, preparing genomic libraries from organisms expressing 
30 the hydrolase, and screening the libraries for the gene of interest by hybridization to the probes. Positively 
hybridizing clones are then mapped and sequenced. 

The cloned subtilisin is then used to transform a host cell in order to express the subtilisin. The subtilisin 
gene is then ligated into a high copy number plasmid. This plasmid replicates in hosts in the sense that it contains 
the well-known elements necessary for plasmid replication: a promoter operably linked to the gene in question 
35 (which may be supplied as the gene's own homologous promotor if it is recognized i.e., transcribed, by the host), 
a transcription termination and polyadenylation region (necessary for stability of the mRNA transcribed by the 
host from the hydrolase gene in certain eucaryotic host cells) which is exogenous or is supplied by the , 
endogenous terminator region of the subtilisin gene and. desirably, a selection gene such as an antibiotic 
resistance gene that enables continuous cultural maintenance of plasmid-infected host ceils by growth in 
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antibiotic-containing media. High copy number plasm ids also contain an origin of replication for the host, 
thereby enabling large numbers of plasmids to be generated in the cytoplasm without chromosomal limitations. 
However, it is within the scope herein to integrate multiple copies of the subtil is in gene into host genome. This 
is facilitated by procaryotic and eucaryotic organisms which are particularly susceptible to homologous 
5 re combination. 

Once the subtilisin gene has been cloned, a number of modifications are undertaken to enhance the use 
of the gene beyond synthesis-of the naturally-occurring precursor subtilisin. Such modifications include the 
production of recombinant subtilisin as disclosed in U.S. Patent No. 5 J 7 1,008 or EPO Publication No. 0130756 
and the production of subtilisin mutants described herein. 

10 Mutant design and preparation. 

A. Subtilisin Variants Capable of Cleaving Substrates Having Dibasic Residues. 

For the preparation of subtilisin variants capable of cleaving substrates containing dibasic residues, the 
following analysis was undertaken. 

A number of structures have been solved of subtilisin with a variety of inhibitors and transition state 

15 analogs bound (Wright, C. S. t Alden, R. A. and Kraut, J. (1969) Nature, 221:235-242; McPhalen. C.A. and 
James, N.G. (1988) Biochemistry. 27:6582-6598; Bode, W., Papamokos, E., Musil, D.. Seemueller. U. and Fritz, 
M. (\9Z6)EMBOJ.. 5:813-818; and Bott, R., Uhsch. M. t Kossiakoff, A., Graycar, T., Katz, B. and Power, S. 
(1988) J. Biol. Chem.. 263:7895-7906). One of these structures. Figure 1, was used to locate residues that are 
in close proximity to side chains at the PI and P2 positions from the substrate. Previous work had shown that 

20 replacement residues at positions 156 and 166 in the SI binding site with various charged residues lead to 
improved specificity for complementary charged substrates (Wells, J. A., Powers, D. B., Bott, R. R., Graycar, 
T. P. and Esteli, D. A. (1987) Proc. Natl. Acad. Sci. USA, 84:1219-1223). Although longer range electrostatic 
effects of substrate specificity have been noted (Russell, A. J. and Fersht, A. R. (1987) Nature . 328:496-500) 
these were generally much smaller than local ones. Therefore, it seemed reasonable that local differences in 

25 charge berween subtilisin BPK and the eukaryotic enzymes may account for the differences in specificity. 

A detailed sequence alignment of 35 different subtil is in-like enzymes (Siezen, R. J., de Vos, W. M., 
Leunisscn, A. M., and Dijkstra, B. W. (1991) Prot Eng., 4:719-737) allowed us to identify differences berween 
subtilisin BPN' and the eukaryotic processing enzymes. KEX2, furin and PC2. Within the SI binding pocket 
there are a number of charged residues that appear in the pro-hormone processing enzymes and not in subtilisin 

30 BPN* (Table 1A). 
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TABLE 1A 

SI subsltc 





i">< in 1 


i < i t <^ 
i 31* I j / 




Subtilisin BPN* 


SLGGPSG 
f^FO ID NO* 3\ 


A AAGNEG 


ST-VGYP 
f^FO in wn- <\ 


Kex2 


SWGPADD 

(SEO ID NO: 6) 


FASGNGG 
(SEO ID NO: 7) 


CNYDGYT 
(SEO ID NO: 8) 


Fuhn 


SWGPEDD 

(SEO ID NO: 9). 


WASCNGG 
(SEO ID NO: 10) 


CNCDG YT 
(SEO ID NO: 1!) 


pa 


SWGPADD 

(SEO ID NO: 6) 


WASGDGG 

(SEO ID NO: 12) 


CNCDG YA 

(SEO ID NO: 13) 



' numbering according to subtilisin BPN' sequence 



For example, the eukiryotic enzymes have two conserved Asp residues at 130 and 13 1 as well as an Asp at 1 65 
1 D thai is preceded by insertion of a Tyr or Cys. However, in the region from 151-157, subtilisin BPN' contains a 
Glu and the eukaryotes a conserved Gty. 

In the S2 binding site there were two notable differences in sequence (Table IB). 



TABLE IB 
S2 subsiie 





30-35 


60-64 


Subtilisin BPN' 


V1DSGI 
(SEQ ID NO: 14) 


DNNSH 
(SEQ ID NO: 15) 


KEX2 


IVDDGL 
(SEQ ID NO: 16) 


SDDYH 
(SEO ID NO: 17) 


Furin 


ILDDGI 
(SEQ ID NO: 18) 


NDNRH 
(SEO ID NO: 19) 


pa 


1MDDGI 

(SEO ID NO: 20) 


WFNSH 
(SEO ID NO: 21) 



Subtilisin BPB' contains a Ser at position 33 whereas the pro-hormone processing enzymes contain Asp. There 
20 is not as clear a consensus in the region of 60-64, but one notable difference is at position 62. This side chain 
which points directly at the P2 side chain (Figure 1) is Asn in subtilisin BPN, furin and PC2 but Asp in KEX2. 
Thus, not all substitutions were clearly predictive of the specificity differences. 

A variety of mutants were produced to probe and engineer the specificity of subtilisin BPN' using 
oligonucleotides described in Table 2. 
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TABLE 2 
Oligonucleotides used for slse^dirccted 
mutagenesis on subtlllslru 



Muunt 


Oligonucleotide 


Specificity 
Pocket 


Activity 
Expressed 


S33D 


5 - GCGGTTATCGACG # A*CGGTATCGATTCT -3' 
(SEQ ID NO: 22) 


S2 




S33K 


5 f - GCGGTTATCGACAA*A»G»GTATCGATTCT -3' 
(SEQ ID NO: 23) 


S2 




S33E 


5'- GCGGTTATCGACG*A'A*GGTATCGATTCT -3* 
(SEQ ID NO: 24) 


S2 




N62D 


5*- CCAAGACAACG*ACTCTCACGGAA -3' 
(SEO ID NO: 25) 


S2 




N62S 


5'- CC AAG AC AAC AG • CTCTC ACG G AA -3* 
(SEQ ID NO: 26) 


S2 


•4- 


N62K 


5- CCAAGACAACAAA'TCTCACGGAA -3' 
(SEQ ID NO: 27) 


S2 




G166D 


5'-CACTTCCGGCAGCTCG*T*C*G , ACAGTGGA*C*T 

ACCCTGGC.AAATA-3' 

(SEQ ID NO: 28) (Inserts Sal I site) 


SI 




G166E 


5'-C ACTTCCGGC AGCTCG^T* C s G* ACAGTGG A*GT 

ACCCTGGCAAATA-3 , 

(SEO ID NO: 29) (Inserts Sal I site) 


SI 




G128P/P129A 


5*- TTAACATGAGCCTCGGCC'C*AG'CTA # G*C*GGT 

TCTGCTGCTTTA -3* 

(SEO ID NO: 30) (Inserts Nhe 1 she) 


SI 




G128P/P129A/ 
S130D/G13ID 


5 , -TTAACATGAGCCTCGGCC , C , C*G»CGG t A t TGA» 
TTCTG CTG CTTT A A A -3* 

(SEO ID NO: 31) (Inserts Sac 11 site) 


SI 




T164N/V165D 


S^CGGCAGCTCAAGCA'A'C'G'A-rGGCTAT-CCT 
GGCAAATACCCTTCTGTCA 0' 
(SEO ID NO: 32) (Inserts BsaBI site) 


SI 
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T164Y/V165D 


5 -CGGC AG CTC A AG C A •A , C"G*A*T* GG CT AT S CCT 
(SEQ ID NO: 33) (Inserts BsaBl site) 


SI 




T164N- 

Y(inscrt)- 

V165D 


5 "ALT! iLvUULAuLlLl I U AA C 1 A t U n 

GGGTACCCTGGC AA ATA-3' 
(SEO ID NO: 34) (Inserts BstBl site) 


SI 




N62D/G166D 


See individual mutations 


S1/S2 




N62D/G166E 


See individual mutations 


SI/S2 





• Asterisks indicate base changes from the pSS5 (wild-type) template. 



After producing the mutant plasmids they were transformed into a protease deficient strain of B. subtiiis 
(BG2036) that lacks an endogenous gene for secretion of subtilisin. These were then tested for protease activity 
ID on skim milk plates. 

The first set of mutants tested were ones where segments of the SI binding site were replaced with 
sequences from KEX2. None of these segment replacements produced detectable activity on skim milk plates 
even though variants of subtilisin whose catalytic efficiencies are reduced by as much as 1000-fold do produce 
detectable halos (Wells, JA., Cunningham, B.C., Graycax, T.P. and Estell, DA. (1986) Philos. Trans. R. Soc. 
15 Land A .317:415-423). We went on to produce single residue 

substitutions that should have less impact on the stability. These mutants at positions 166 in the SI site, and 33 
and 62 in the S2 site, were chosen based on the modeling and sequence considerations described above. 
Fortunately all single mutants as well as combination mutants produced activity on skim milk plates and could 
be purified to homogeneity. 
2 o Kinetic anafysb of variant subtilisins. 

To probe the effects of the G166E and G166D on specificity at the PI position we used substrates having 
the form suc-AAPX-pna (SEQ ID NO: 69) where X was either Lys (SEQ ID NO. 58). Arg (SEQ ID NO. 59) . 
Phe (SEQ ID NO. 56), Met (SEQ ID NO. 60) or Gin (SEQ ID NO. 6 1 ). The l^/Km values were determined 
from initial rate measurements and results reported in Figure 2. Whereas the wild-type enzyme preferred 
25 Phe>Met>Lys>Arg>Gln, the G166E preferred Lys-Phe>Arg~Met>Gln. and G166D preferred 
Lys>Phe-Arg-Met>Gln. Thus, both the acidic substitutions at position 166 caused a shift in preference for basic 
residues at the PI site, as previously reported (Wells, J. A., Powers, D. B., Bon, R. R., Graycar, T. P.and Estell, 
D. A. (1987a), Proc. Sail Acad. Sci USA S4:12I9-1223). 

The effects of single and double substitutions in the S2 binding site were analyzed with substrates having 
30 the form, suc-Ala-Ala-Xaa-Phe-pna (SEQ ID NO. 70) and are shown in Figure 3 . At the P2 position the wild- 
type enzyme preferred Ala>Pro>Lys>Arg>Asp. In contrast, the S33D preferred Ala>Lys-Arg-Pro>Asp and 
the N62D preferred Lys>Ala>Arg>Pn»Asp. Although the effects were more dramauc for the N62D mutant, 
the S33D variant also showed significant improvement toward basic P2 residues and corresponding reduction 
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in hydrolysis of the Ala and Asp P2 substnues. We then analyzed the double mutant, but found it exhibited the 
catalytic efficiency of the worse of the two single mutants for each of the substrates tested. 

Despite the less than additive effects seen for the two charged substitutions in the S2 site, we decided to 
combine the best S2 site variant (N62D) with either of the acidic substitutions in the S 1 site. The two double 
5 imnants,N62D/G166E and N62D/G 166D. were analyzed with substrates having the form. suc-AAXX-pna (SEQ 
IDKO. 71) where XX was enher KK (SEQ ID NO. 66). KR (SEQ ID.NO. 67). KF (SEQ ID NO. 62). PIC (SEQ 
ID NO. 58). PF (SEQ ID NO. 56) or AF (SEQ.1D NO. 63) (Figure 4). The wild-type preference was 
AF>PF-KF>KK-PK>KR, whereas the double mutants had the preference KK>KR>KF>PK-AF>PF. Thus for 
the double mutants there was a dramatic improvement toward cleavage of dibasic substrates and away from 
10 cleaving the hydrophobic substrates. 

The greater than additive effect (or synergy) of these mutants can be seen from ratios of the catalytic 
efficiencies for the single and multiple mutants. For example, the G166E variant cannot distinguish Lys from 
Phe at the PI position. Yet the N62D/G166E variant cleaves the Lys-Lys substrate about 8 times faster than the 
Lys-Phe substrate. Similarly the G166D cleaves the Lys PI substrate about 3 times faster than the Phe PI 
15 substrate, but the N62D/G 1 66D double mutant cleaves a Lys-Lys substrate 1 8 times faster than a Lys-Phe 
substrate. Thus, as opposed to the reduction in specificity seen for the double mutant in the S2 site, the S1-S2 
double mutants enhance specificity for basic residues. It is possible that these two sites bind the dibasic 
substrates in a cooperative manner analogous to a chelate effect. 

Therefore, according to the present invention, subtilisin mutants having a preference for dibasic residues 
20 are preferred. According to this aspect of the present inventionsubstitution of amino acids corresponding to 
amino acids N62 and G166 of subtilisin BPK produced from Bacillus amyloliquefaciens are prepared. In 
particular, ammo acids 62 and 166. or their equivalents, in the precursor subtilisin are substituted with amino 
acid residues Asp or Glu. Preferred subtilisin variants according to this aspect of the invention include 
N62D/G 1 66D. N62E/G 1 66E. N62E/G 1 66D, and N62D/G1 66E variants of subtilisin BPK and their equivalents. 

25 B. Subtilisin Variants Capable of Cleaving Substrates Having Tribasic Residues 

For the preparation of subtilisin variants specific for substrates containing a third basic residue at substrate 
position P4 we used the crystal structure of subtilisin BPN' coraplexed with Ala-Ala-Pro-Phe-Boronate(SEQ ID 
NO: 56) (Figure 7) in combination with sequence alignments of subtilisin BPN*. KEX2. Furin. PC2. and P 
(Table 3) in designing basic specificity into the SI and S2 and S4 subsites. The two subtilisin BPN' residues 

30 that most prominently display their side chains into the S4 pocket are Y104 and 1107 (Figure 7). 

Sequence alignments of subtilisin BPN' and the mammalian prohormone-processing proteases (Siezen. 
R. J., de Vos. W. M.. Leunissen. A. M.. and Dijkstn. B. W. (1991 ) Proi. Eng. 4:719-737) (Table 3) reveal that 
position 104 is conserved as Asp. and 107 as Glu in the prohormone convening (Arg-P4 specific) enzymes. 
Therefore these two mutations were introduced either individually or in combination into the dibasic-specific 

35 N62D/G166D subtilisin BPN' background (Table 4). 
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Table 3 Sequence alignments for the S4 site of subtilisins 

S4 Site 
100-110 



5 Subtiliflin GSGQYSWIING (SEQ ID NO: 77) 



KEX2 GDITTEDEAAS (SEQ ID KO: 78) 

Fur in . GEVTDAVEARS (SEQ ID NO: 79) 

PC2 PPMTDIIEASS (SEQ ID NO: 80) 

P GIVTDAIEASS (SEQ ID NO: 81) 



10 Table 4 describes oligonucleotides used for she-directed mutagenesis, protein regions affected by the 

mutations, and relative expression of protein for N62D/G 1 66D subtilistn BPN* variants. Bold type indicates 
base changes from the pSS5 (N62D/G166D) template. For "Protein Expressed," indicates a high level 
of expression of mature enzyme in crude culture medium, and indicates no enzyme detectable, 

TABLE 4 

Protein Pro- 

15 tL zt Pliqgauclfgtldt Eaaiaxi t«in 

prti 



Y104D 5'- GGTTCCGGCCAA . GATAGCTGGATCATT -3* S4 

(SEQ ID NO: 82) pocket 

I107E 5 ' - CCAATACAGCTGGOAAATTAACGGAATCG -3* S4 

(SEQ ID NO: 83) pocket 

Y104D/I107E 5 ' - GGTTCCGGCCAAGATAGCTGGOAAATTAACG S4 

GAATCGA -3' (SEQ ID NO: 84) pocket 

A(-4)R/ 5'- AAGAAGATCACCTAA0ACA7AAGCGCGCGC Proces- 

20 A(-2)K/ AGTCCGTGC -3* (SEQ ID NO: 85) oing 

Y(-1)R site 

Y104D/ See individual mutations S4 

A(-4)R/ pocket * 

A(-2)K/ Proces- 

25 Y(-1)R sing 

site 

I107E/ See individual nutations S4 

A(-4)R/ pocket ♦ 

A(-2)K/ Proces- 

Y(-1)R sing 

site 

30 Y104D/I107E/ See individual mutations S4 

A(-4)R/ pocket ♦ 

A(-2)K/ Proces- 

Y(-1)R sing 

site 
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Initial attempts to express the triple mutants in Bacillus were unsuccessful, as indicated by SDS-PAGE 
of crude supematants. We reasoned that the source of the expression problem could Ue in the fact that 
correct folding and maturation of subtilisin requires autolytic cleavage of its propeptide (Power. S.D.. 
Adams, R. M. and Wells, J. A. (1986) Proc. Sad. Acad Sci. USA 83, 3096-3100).. The processing site in 
5 the wild-type enzyme has a sequence that is optimized for the natural substrate preference. AHAYIA (I 
denotes the site of cleavage). Although the N62D/G166D subtilisin can still autolyze itself with the wild- 
type processing site, the additional S4 pocket mutations could reduce the cleavage to the point where 
expression was lowered to a minute level 

To test whether the mutants were expressed poorly due to an inability to autolytically process itself, 

1 o mutations in the processing site were simultaneously incorporated to accommodate the changes in substrate 
specificity. Thus the sequence from positions -4 to -1 was changed from AHAY to RHKR in combination 
with the S4 site mutations. ForN62D/Y104D/G166D, high levels of expression could then be achieved 
providing an indication that the additional Y104D mutation induced an especially strong preference for P4 
Arg over Ala. Variants containing the U07E mutation, however, could not be expressed even with the 

1 5 change in the processing site. 

Kinetic analysb of variant subtilisins 
The mature N62D/Y 104D/G1 66D variant was purified and analyzed for its ability to hydrolyze several 
tetrapeptide-pNA substrates. Table 5 displays the results along with data for the N62D/G 1 66D mutant and 
wild-type subtilisin. 
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The tribasic substrates succinyi-RAKR-pNA (SEQ ID NO: 86) and succinyl-KAKR-pNA (SEQ ID 
NO: 87) were hydroiyzed with high catalytic efficiency (k CBT /Km) by the triple mutant, at a level similar to 
wild-type subtilisin versus one of its best substrates. succinyl-AAPF-pN A (SEQ ID NO: 56). In contrast, 
the dibasic substrate succinyl-AAKR-pNA (SEQ ID NO: 67) was hydroiyzed 60-fold less efficiently, mostly 
5 due to diminution of k^. This indicates a dramatic specificity change from the wild-type preference at P4, 
at which hydrophobic residues are strongly favored over charged side chains (Gren, H. and Breddam. K. 
(1992) Biochemistry 3i, 8967-8971). In fact N62D/G166D subtilisin appears to cleave at an alternate site 
in the succinyl-RAKR-pN A (SEQ ID NO: 86) substrate, indicating that Arg was not accepted in its wild-type 
S4 site. 

10 The large magnitude of the combined specificity changes in the N62D/Y104D/G166D variant is 

evidenced by its strong discrimination against substrates that are preferred by the wild-type enzyme. For 
example. succinyl-AAPF-pNA (SEQ ID NO: 56) is hydroiyzed 6 x 10 4 -fold less efficiently than succinyl- 
RAKR-pNA (SEQ ID NO: 86). Clearly, the S4 site mutation greatly improves upon the discriminatory 
power of the parent dibasic-specific N62D/G166D subtilisin, where the ratio of catalytic efficiency for 

15 succmyl-AAKR-pNA versus succinyl-AAPF-pNA is 1.9 x 10*. The improvement in discrimination (310- 
fold) is also higher than would be predicted from the data for hydrolysis of succinyl-RAKR-pN A (SEQ ID 
NO: 86) versus succinyl-AAKR-pNA (SEQ ID NO: 67) by the triple mutant (a 60-fold effect). 

Therefore in order to produce subtilisin variants capable of cleaving substrates containing basic 
residues at positions P4. P2 f and PI, additional site specific substitutions are made in the dibasic specific 

2 0 subtilisin variants. According to this aspect of the invention, substitution of the amino acid corresponding 
to Y 104 of subtilisin BPN 1 produced by Bacillus Amyloliqucfaciens, i.e., amino acid 104 of subtilisin BPN* 
or its equivalent, produces a variant having substantially ahered substrate specificity. In a preferred 
embodiment of the present invention amino acids corresponding to N62, YI04, and G166 of subtilisin BPN* 
are substituted with acidic ammo acids, preferably Asp and Glu and most preferably Asp. Subtilisin BPN' 

25 variants N62D/Y104D/G166D, N62D/Y104E/G166D. N62E/Y104D/G166E, N62E/Y 104E/G166E, 
N62E/Y104D/G166D, N62E/Y104E/G166D, N62D/Y104E/G166E, and N62D/Y104D/G166E, and there 
equivalents are preferred. Most preferred among this group of subtilisin variants are the 
N62D/Y104D/G166D subtilisin BPN 1 variants and their equivalents. 

Mutagenesis and Synthetic Techniques 
3 o Various techniques are available which may be employed to produce mutant DN A. which can encode 

the subtilisin variants of the present invention. For instance, it is possible to derive mutant DN A based on 
naturally occurring DNA sequences that encode for changes in an amino acid sequence of the resultant 
protein relative to a precursor subtilisin. These mutant DNA can be used to obtain the variants of the present 
invention. 

35 According to the invention, specific residues of B. amyloliquefaciens subtilisin are identified for 

substitution. These amino acid residue position numbers refer to those assigned to the B. amyloliquefaciens 
subtilisin sequence (see the mature sequence m Fig. 1. of U.S. Patent No. 4.760,025). The invention, 
however, is not limited to the mutation of this particular subtilisin but extends to precursor subtilisins 



-19- 



«,~^«fr-M PCI7US96/02861 
WO 96/27571 

containing amino acid residues which are equivalent, as defined herein, to the particular identified residues 
in B. amyioliquefaciens subtilisin. Equivalenrammo acids can be found in. for instance, subtilisn Carlesbcrg 
from Bacillus licheniformis. Bode et al., (1986) EMBO J., 5:813-8 1 8; thermhase from Thermoactinomyees 
vulgaris, Gros et al. ( (1989)J. Mol. Biol. 210:347-367; and proteinase K from Triiirachium album. Betzel. 
5 et aU (1988) Acto Crysollogr., B, 44:163-172) as described by Siezen et al.. (199 1 ) Prof. Eng., 4: 7 19-737). 

By way of illustration, with expression vectors encoding the precursor subtilisin in hand (see for 
example U.S. Patent No 4,760,025) site specific mutagenesis (Kunkel et al., (1991) Methods Enrymol. 
204:125-139; Carter. P. f et al., (1986) NucL Acids. Res. 13:4331; Zoller. M. J. et al.. (1982) Nucl. Acids 
Res. 10:6487), cassette mutagenesis (Wells, J. A., et al., (1985) Gene 34:315). restriction selection 
10 mutagenesis (Wells, J. A., et al.. (1986) Philos. Trans, R. Soc. London Ser A 317, 415) or other known 
techniques may be performed on the DNA. Ue mutant DNA can then be used in place of the parent DN A 
by insertion into the appropriate expression vectors. Growth of host bacteria containing the expression 
vectors with the mutant DNA allows the production of variants which can be isolated as described herein. 
Oligcmuclec^ide-mediated mutagenesis is a preferred method for preparing the variants of the present 
15 invention. This technique is well known m the art as described by Adelman et al., (1983) DMA, 2:183. 
Briefly, the native or unaltered DNA of a precursor subtilisin, for instance subtilisin BPN\ is altered by 
hybridizing an oligonucleotide encoding the desired mutation to a DNA template, where the template is the 
single-stranded form of a plasmid or bacteriophage containing the unaltered or native DNA sequence of the 
precursor. 

2 o After hybridization, a DNA polymerase is used to synthesize m entire second complementary strand 

of the template that will thus incorporate the oligonucleotide primer, and will code for the selected alteration 
in the DNA. 

Generally, oligonucleotides of at least 25 nucleotides in length are used. An optimal oligonucleotide 
will have 12 to 15 nucleotides that are completely complementary to the template on either side of the 
2 5 nucleotide^) coding for the mutation. This ensures that the oligonucleotide will hybridize properly to the 
single-stranded DNA template molecule. The oligonucleotides are readily synthesized using techniques 
known in the art such as those described by Crea et al. (1987) Proc. Natl. Acad. Sci. USA, 75:5765. 
Exemplary oligonucleotide sequences for introducing amino acid changes into precursor subtilisin BPN" are 
provided in Tables 2 and 4. 

30 Single-stranded DNA template may also be generated by denaturing double-stranded plasmid (or 

other) DNA using standard techniques. 

For alteration of the native DNA sequence (to generate amino acid sequence variants, for example). 

the oligonucleotide is hybridized to the single-stranded template under suitable hybridization conditions. 

A DNA polymerizing enzyme, usually the Klenow fragment of DNA polymerase 1, is then added to 
35 synthesize the complementary strand of the template using the oligonucleotide as a primer for synthesis. A 

heteroduplex molecule is thus formed such that one strand of DNA encodes the variant form of the subtilisin. 

and the other strand (the original template) encodes the native, unaltered sequence of the precursor subtilisin. 

This heteroduplex molecule is then transformed into a suitable host cell. After the cells are grown, they are 

plated onto agarose plates and screened using the oligonucleotide primer radiolabeled with 32-phosphate to 
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identify the baaerial colonies that contain the mutated DNA. The mutated region is then removed and 
placed m an appropriate vector for protein production, generally an expression vector of the type typically 
employed for transformation of an appropriate host. 

The method described immediately above may be modified such that a homoduplex molecule is 
5 created wherein both strands of the plasmid contain the mutarion(s). The modifications are as follows: The 
single-stranded oligonucleotide is annealed to the single-stranded template as described above. A mixture 
of three deoxyribonucleotides, deoxyriboadenosine (dATP), deoxyriboguanosine (dGTP). and 
deoxyribothymidtne (dTTP). is combined with a modified thio-deoxyribocytosinc called dCTP-{aS) (which 
can be obtained from Amersham Corporation). This mixture is added to the template-oligonucleotide 
10 complex. Upon addition of DNA polymerase to this mixture, a strand of DNA identical to the template 
except for the mutated bases is generated. In addition, this new strand of DNA will contain dCTP-(oS) 
instead of dCTP, which serves to protect it from restriction endonuclease digestion. 

After the template strand of the double-stranded heteroduplex is nicked with an appropriate restriction 
enzyme, the template strand can be digested with ExflHl nuclease or another appropriate nuclease past the 
15 region that contains the she(s) to be mutagen ized. The reaction is then stopped to leave a molecule that is 
only partially single-stranded. A complete double-stranded DNA homoduplex is then formed using DNA 
polymerase in the presence of all four deoxyribonucleotide triphosphates, ATP, and DNA ligasc. This 
homoduplex molecule can then be transformed into a suitable host cell as described above. 

DNA encoding variants with more than one amino acid to be substituted may be generated in one of 
2 0 several ways. If the amino acids are located close together in the polypeptide chain, they may be mutated 
simultaneously using one oligonucleotide that codes for all of the desired amino acid substitutions. If. 
however, the amino acids are located some distance from each other (separated by more than about ten amino 
acids), it is more difficult to generate a single oligonucleotide that encodes all of the desired changes. 
Instead, one of two alternative methods may be employed 
25 In the first method, a separate oligonucleotide is generated for each amino acid to be substituted. The 

oligonucleotides are then annealed to the single-stranded template DNA simultaneously, and the second 
strand of DNA that is synthesized from the template will encode all of the desired amino acid substitutions. 



The alternative method involves two or more rounds of mutagenesis to produce the desired mutant. 
The first round is as described for the single mutants: wild-type DNA is used for the template, an 

30 oligonucleotide encoding the first desired amino acid substitution(s) is annealed to this template, and the 
heteroduplex DNA molecule is then generated. The second round of mutagenesis utilizes the mutated DNA 
produced in the first round of mutagenesis as the template. Thus, this template already contains one or more 
mutations. The oligonucleotide encoding the additional desired amino acid substitution(s) is then annealed 
to this template, and the resulting strand of DNA now encodes muuuions from both the first and second 

35 rounds of mutagenesis. This resultant DNA can be used as a template in a third round of mutagenesis, and 
soon. 



-21 



WO 9*27671 PCT/US96AJ2W1 
Cleavage cf a Fusion Proteins With SubdUsin Variants 

A fusion protein is my polypeptide thai contains within it an affinity domain (AD) that usually aids 
in protein purification, a protease cleavage sequence or substrate linker (SL), which is cleaved by a protease 
and a protein product of interest (PP). Such fusion proteins are generally expressed by recombinant DNA 
5 technology. The genes for fusion proteins are designed so that the SL is between the AD and PP. These 
usually take the form AD-SL-PP such that the domain closest to the N-terminus is AD and PP is closest to 
the C-termmus. 

Examples of AD would include, glutathione-S-transferase which binds to glutathione, protein A (or 
derivatives or fragments thereof) which binds IgG molecules, poly-histidine sequences, particularly (His)6 
10 (SEQ ID NO: 51) that bind metal affinity columns, maltose binding protein that binds maltose, human 
growth hormone that binds the human growth hormone receptor or any of a variety of other proteins or 
protein domains that can bind to an immobilized affinity support with an association constant (Ka) of > 10* 
M" 1 . 

The SL can be any sequence which is cleaved by the subtilisin variants of the present invention. In 
15 preparations where the variant N62D/Y 1 04 D/G 1 66D or its equivalent are used the SL can be any sequence, 
preferably at least 4 amino acids, in which the P4, P2, and PI residues are basic residues. Therefore a SL 
linker is employed of the general formula P4-P3-P2-P1 wherein P4. P2, and PI are basic amino acid 
residues. Preferred SLs according to this aspect of the invention include LyvAla-Lys-Arg (SEQ ID NO: 87) 
and Arg-Ala-Lys-Arg (SEQ ID NO: 86). 
20 Likewise, where the N62D/G166D subtilisin variant is contemplated the SL preferably contains di- 

basic residues. For the variants capable of cleaving substrates containing dibasic residues the SL should be 
at least four residues and preferably contain a large hydrophobic residue at P4 (such as Leu or Met) and 
dibasic residues at P2 and PI (such as Arg and Lys). A particularly good substrate is Leu-Met-Arg-Lys- 
(SEQ ID NO: 52), but a variety of other sequences may work including Ala-Ser-Arg-Arg (SEQ ID NO: 50) 
25 and even Leu-Thr-Ala-Arg (SEQ ID NO 53), 

It b often useful mat the SL contain a flexible segment on hs N-terminus to better separate it from the 
AD and PP. Such sequences include Gly-Pro-Gly-Gly (SEQ ID NO: 54) but can be as simple as Gly-Gly 
or Pro-Gly. Thus, an example of a particularly good SL would have the sequence Gly-Pro-Gly-GIy-Leu- 
Met-Arg-Lys (SEQ ID NO: 88) in the case of subtilisin variants capable of cleaving substrates containing 
3D dibasic amino acids, orGly-Pro-Gly-Gly-Lys-Ala-Lye-Arg (SEQ ID NO: 89). This sequence 

would be inserted between the AD and PP domains. 

The PP can be virtually any protein or peptide of interest but preferably should not have a Pro, He, 
Thr, Val, Asp or Glu as its first residue (PT), or Pro or Gly at the second residue (P2*) or Pro at the third 
residue (P3'). Such residues are poor substrates for the enzyme and may impair the ability of the subtilisins 
35 variant to cleave the SL sequence. 

The conditions for cleaving the fusion protein are best done in aqueous solution, although it should 
be possible to immobilize the enzyme and cleave the soluble fusion protein. It may also be possible to cleave 
the fusion protein as it remains immobilized on a solid support {e.g. bound to the solid support through AD) 
with the soluble subtilisin variant. It is preferable to add the enzyme to the fusion protein so that the enzyme 
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is Jess than one part in 100(1:100) by weight. A good buffer is 10-50mM Tris (pH 82) in lOmM NaCl. 
A preferable temperature is about 25°C although the enzyme is active up to 65°C. The extent of cleavage 
can be assayed by applying samples to SDS-PAGE. Generally suitable conditions for using the subtilisin 
variants of this invention do not depart substantially from those known in the an for the use of other 
5 subtilisins. 

EXAMPLES 

In the examples below and elsewhere, the following abbreviations are employed: subtilisin BPN\ 
subtilisin from Bacillus amyloliquffacicns\ Boc-RVRR-MCA (SEQ ID NO. 73 ) , N-t-butoxy carbonyl- 
argmine-valme-arginine-argmm^^ <S£ Q ID N0 * 

10 s6),N-soicctoyl-alanme-alan^ ID NO. 56) ; hGH, human 

growth hormone; hGHbp, extracellular domain of the hGH receptor; PBS, phosphate buffered saline: AP, 
alkaline phosphatase; 

Example 1 

Construction and Purification of Subtilisin Mutants. 
15 Site-directed mutations were introduced into the subtilisin BPK gene cloned into the phagemid pSS5 

(Wells. J. A., Ferrari. E., Henner. D. J., Estell, D.A. and Chen, E. Y. (1983) Nucl Acids Res. 1 1 :791 1- 
7929). Single-stranded iiracil-containing pSS5 template was prepared and mutagenesis performed using the 
method of Kunkel (Kunkel, T. A. , Bebcnek, K and McClary, J. (1991) Methods Enzymoi 204:125-139). 
For example, the synthetic oligonucleotide N62D, 



20 (5*- CCAAGACAACG'ACTCTCACGGAA -3") (SEQ ID 

NO. 25) 

in which the asterisk denotes a mismatch to the wild-type sequence, was used to construct the N62D mutant. 
The oligonucleotide was first phosphorylated at the 5* end using T4 polynucleotide kinase according to a 
described procedure (Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) in "Molecular Cloning: A 

25 Laboratory Manual," Second Edition, Cold Spring Harbor, N.Y.)- The phosphorylated oligonucleotide was 
annealed to single-stranded uracil-containing pSS5 template, the complementary DNA strand was filled in 
with deoxynucleotides using T7 polynucleotide kinase, and the resulting nicks ligaxed using T4 DNA ligase 
according to a previously described procedure (Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) in 
-Molecular Cloning: A Laboratory Manual," Second Edition, Cold Spring Harbor. N.Y.). Heteroduplex 

3 0 DNA was transformed into the E. coli host JM101(Yanish-Perron. C, Viera, J., and Messing. J. (1985) Gene 
33: 103-199), and putative mutants were confirmed by preparation and dideoxy nucleotide sequencing of 
single stranded DNA (Sanger, F., Nicklen, S. and Coulson. A. R. (1977)/>roc. Natl. Acad. Sci USA 
74:5463-5467) according to the SEQUENASE* protocol (USB Biochemicals). Mutant single-stranded 
DNA was then retransformed into JM 101 cells and double stranded DNA prepared according to a previously 
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described procedure (Sambrook, J./Fritsch, E. F.. and Maniatis, T. (1989) in "Molecular Cloning: A 
Laboratory Manual,* Second Edition, Cold Spring Harbor, N.Y.). For other mutations also requiring the use 
of one primer, the oligonucleotides used are listed in Table 2. For several of these oligonucleotides, 
additional silent mutations emplacing new restriction sites were simultaneously introduced to provide an 
5 alternative verification of mutagenesis. 

To construct the double mutants N62D/G166D. and N62D/G166E, pSS5 DNA containing the N62D 
mutation was produced in single-stranded uracil-containing form using the Kunkel procedure (Kunkel, T. 
A. , Bebenek, K and McClary, J. (1991) Methods EmymoL 204. 125-1 39). This mutant DNA was used as 
template for the further introduction of the G166D or G166E mutations, using the appropriate 
10 oligonucleotide primers (see sequences in Table 2), following the procedures described above. 

To construct the triple mutants, such as N62D/YI04D/G166D, pSS5 DNA containing the 
N62D/G166D mutation or other appropriate double mutation, was produced in single-stranded uracil- 
containing form using the Kunkel procedure (Kunkel, T. A. , Bebenek, K and McClary, J. (1991) Methods 
Enzymol. 204, 125-139). This mutant DNA was used as template for the further introduction of the Y104D 
15 mutations, using the appropriate oligonucleotide primers (see sequences in Table 4), following the 
procedures described above. 

For expression of the subtiiisin BPK mutants, double stranded mutant DNA was transformed into a 
protease-deficient strain (BG2036) of Bacillus Subtilis (Yang. M. Y„ Ferrari, E. and Henner, D. J. 
(\9%4)JournaI of Bacteriology 160:15-21) according to a previous method (Anagnostopolouus. C. and 
20 Spizizen, J. (1961) Journal of Bacteriology 81:741-746) in which transformation mixtures were plated out 
on LB plus skim milk plates containing 12 J ug/mL chloramphenicol. The clear halos indicative of skim 
milk digestion surrounding transformed colonies were noted to roughly estimate secreted protease activity. 

The transformed BG2036 strains were cultured by inoculating 5 mL of 2xYT media (Miller, J. H., 
(1972) in "Experiments in Molecular Genetics," Cold Spring Harbor, N.Y.) containing 12.5 ug/mL 
2 5 chloramphenicol and 2 mM CaCl 2 at 37 °C for 1 8-20 h, followed by 1 : 1 00 dilution in the same medium and 
growth in shake flasks at 37 °C for 18-22 h with vigorous aeration. The cells were harvested by 
centrifugation (6000g, 15 min, 4°C). and to the supernatant 20mM (final) CaCl 2 and one volume of ethanol 
(-20°C) were added. After 30 min at 4°C, the solution was centrifuged (12.000g. 15 min, 4°C), and one 
volume of ethanol (-20°C) added to the supernatant. After 2 h at -20°C, the solution was centrifuged 
30 (12,000g, 15 min, 4°C) and the pellet rcsuspended in and dialyzed against MC (25 mM 2-(N- 
Morpholino)ethanesulfontc acid (MES). 5 mM CaClj at pH 5.5) overnight at 4°C. The dialysaie was passed 
through a 072 urn syringe filter and loaded onto a mono-S cation exchange column run by an FPLC system 
(Pharmacia Biotechnology). The column was washed with 20 volumes of MC and mutant subtiiisin eluted 
over a linear gradient of zero to 0.15 M NaCI in MC. all at a flow rate of I mL/min. Peak fractions were 
3 5 recovered and the subtiiisin mutant quanikated by measuring the absorbance at 280 nm (Ej l0 0. 1 ■/• • 1.17) 
(Matsubara, R; Kaspcr, C B.; Brown, D. M.; and Smith, E. L. (1965)/ Biol. Chcm., 240:1 125-1 130.) 
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Extra pie 2 

Kinetic Characterizations 

Subtilisins were assayed by measuring the initial rates of hydrolysis of /Miitroanilide tetrapeptide 
substrates in 0.4 mL 20 mM Tris-Cl pH 8.2. 4 •/• (Wv) dimethyl sulfoxide at (25 ± 02)°C as described 
5 previously (Estell, D. A., Graycar, T. P., Miller, J. V., Powers, D. Burnicr, J. P., Ng, P. G. and Wells, 
J. A. (\9l6)Sciencc 233:659-663). Enzyme concentrations (E)q were determined spectrophotometries ly 
using E2 M «0.1% - 1.17 (Matsubara, H.; Kasper, C B.; Brown, D. M.; and Smith, E. L. (I965)J. Biol. 
Chenu240:l 125-1 130.), and were typically 5-50 nM in reaaions. Initial rates were determined for nine to 
twelve different substrate concentrations over the range of 0.001-2.0 mM. Plots of initial rates (v) versus 
10 substrate concentration (S) were fitted to the Michaelis-Menton equation, 

UE)o((S)) 
v = 

Km + (S) 

to determine the kinetic constants k^, and Km (Fersht, A. in "Enzyme Structure and Mechanism", Second 
15 edition. Freeman and Co., N.Y.) using the program Kaleidagraph (Synergy Software, Reading, PA). 



Example 3 

Substrate Phage 

Substrate phage selections were performed as described by Matthews and Wells (Matthews, D. J. and 
Wells, J. A. (1993>Sctence 260:1 113-1117), with minor modifications. Phage sorting was carried out using 

20 a library in which the linker sequence between the gene 111 coat protein and a tight-binding variant of hGH 
was GPGGXjGGPG (SEQ ID NO. 52). The library contained 2 x 10 6 independent transform ants. Phage 
panicles were prepared by infecting 1 mL of log phase 27C7 (F/tet*/Ompf f degP") Escherichia colt with 
approximately 10 s library phage for 1 h at 37°C, followed by 18-24 h of growth in 25 mL 2YT medium 
containing 10 10 M13K07 helper phage and 50 ug/mL carbenicillin at 37°C. Wells of a 96-weIl Nunc 

25 Maxisorb microthcr plate were coated whh 2 ug/mL of hGHbp in 50 mM NaHCO, at pH 9.6 overnight at 
4°C and blocked with PBS (10 mM sodium phosphate at pH 7.4 nd 150 mM NaCI) containing 2.5% (w/v) 
skim milk for 1 h at room temperature. Between 10 n and 10 ,J phage in 0.1 mL 10 mM tris-Cl (pH 7.6), I 
mM EDTA, and 100 mM NaCI were incubated in the wells at room temperature for 2 h whh gentle agitation. 
The plate was washed first with 20 rinses of PBS plus 0.05% Tween 20 and then twice with 20 mM tris-Cl 

30 at pH 8.2. The N62D/G166D subtilisin was added in 0.1 mL of 20 mM tris-Cl at pH 8.2 and protease 
sensitive phage were cluted after a variable reaction time. The concentration of protease and incubation 
times forelution of sensitive phage were decreased gradually over the course of sorting procedure to increase 
selectivity, whh protease concentrations of 0.2 nM (rounds 1-3) and 0.1 nM (rounds 4-9), and reaction times 
of 5 min (rounds 1-6), 2.5 min (round 7), 40 s (round 8) and 20 s (round 9). Control wells in which no 
3 5 protease was added were also included in each round. For the resistant phage pool, the incubation time with 
protease remained constant at 5 min. The wells were then washed ten times with PBS plus 0.05% Tween 
20 and resistant phage eluted by treatment with 0.1 mL of 0.2 M glycine at pH 2.0 in PBS plus 0.05% Tween 
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20 for 1 min at room temperature. • Protease sensitive and resistant phage pools were titered and used to 
infect log phase 27C7 cells for 1 h at 37»C. followed by ccntrifugation at 4000 rpm. removal of supernatant, 
and ^suspension in I mL 2YT medium. The infected cells were then grown 18-24 h in the presence of 
helper phage as described above and the process repeated 9 times. Selected substrates were introduced into 
5 AP fusion proteins and assayed for relative rates of cleavage as described by Matthews and Wells (Matthews. 
D. J, Goodman, L. J. Gorman, C. M„ and Wells. J. A. (1994)Protein Science 3:1 197-1205 and Matthews. 
D. 3. and Wells. J. A. (1993)S:*enc« 260:1 113-11 17). except that the cleavage reactions were performed in 
20mMTris-ClatpH8.2. 

Example 4 

10 Substrate phage selection and cleavage of a fusion protein 

Subtilisin has the capability to bind substrates from the P4 to P3* positions (McPhalen. C. A. and 
James, N. G. (1988) Biochemistry 27:6582-6598 and Bode. W.. Papamokos. E.. Musil. D.. Seemueller. U. 
and Fritz. M. (1986) EMBOJ. 5:813-818). Given this extensive binding site and the apparent cooperative 
nature in the way the substrate can bind the enzyme we wished to explore more broadly the substrate 
15 preferences for the enzyme. To do this we utilized the substrate phage selection (Matthews. D. J.. Goodman. 
L. J, Gorman, C. M.. and Wells. J. A. (1994)Protein Science 3:1 197-1205 and Matthews. D. J. and Wells. 
J. A. (\993)Science 260:1 113-1117) described in Example 3. In this method a five-residue substrate linker 
mat was flanked by di-glycine residues is inserted between an affinity domain (in this case a high affinity 
variant of hGH) and the carboxy-termmal domain of gene III. a minor coat protein displayed on the surface 
20 of the filamentous phage. M13. The five residue substrate linker is fully randomized to generate a library of 
20 s different protein sequence variants. These are displayed on the phage particles which are allowed to bind 
totnehGHbp. The protease of interest was added and if it cleaved the phage panicle at the substrate linker 
ii released that particle. The panicles released by protease treatment can be propagated and subjected to 
another round of selection to further enrich for good protease substrates. Sequences that are retained can 
25 also be propagated to enrich for poor protease substrates. By sequencing the isolated phage genes at the end 
of either selection one can identify good and poor substrates for further analysis. 

We chose to focus on the subtilisin BPN' variant N62D/G166D as it was slightly better at 
flscrmmating the synthetic dibasic substrates from the ethers. We subjected the substrate phage library to 
nine rounds of selection with the subtilisin variant and isolated clones that were either increasingly sensitive 
30 or resistantto cleavage. Of twenty-one clones sequencedfrom the sensitive pool eighteen contained dibasic 
residues, eleven of which had the substrate linker sequence Asn-Leu-Met-Arg-Lys (SEQ ID NO: 35) (Table 
6). 
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TABLE 6 

Substrate phage sequences sensitive or resistant to N62D/G166D tubtWslnfrom a GG*xxxx*GG library 
after 9 rounds of selection'. 

Protease Sensitive Pool 

5 H nffr<fc sites fin Mmrntanic Sites (3) Dibasic Sites f 18) 

K L T A R<3) K L M R 

(SEQ ID KO: 34) (SEQ ID KO: 35 

TASK R(4) 
(SEQ ID KO: 36 

L T R R S 

(SEQ ID NO: 37 

A L S R r 
(SEQ ID KO: 3B) 

L M L R K 

(SEQ ID KO: 39) 



Protease Resistant Poo! 

fln Rmie Sites (71 Monobasic Sites ttl PitoiC SitCSfl) 

ASTHF QKPKF RKPTH 

(SEQ ID KO: 40) (SEQ ID KO: 41) . (SEQ ID KO: 42) 

10IQQQV RPGAM 

(SEQ ID KO: 43) (SEQ ID KO: 44) 

Q G E L P 
(SEQ ID KO: 47) 

A P D P T 
15 (SEQ ID KO: 46) 

Q L L E H 
(SEQ ID KO: 47) 

V K K K H 
(SEQ ID KO: 48) 

20 A Q S K !• 

(SEQ ID KO: 49) 

• Numbers in parentheses indicate the number of limes a particular DNA sequence was isolated. 

Three (3) of the sensitive sequences were monobasic, Asn-Lcu-Thr-Ala-Arg (SEQ ID NO: 34). It is 
known thai subtilisin has a preference for hydrophobic residues at the P4 position. If these and the other 
25 selected substrates were indeed cleaved after the last basic residue they all would have a Leu, Met or Ala at 
the P4 position. Almost no basic residues were isolated in the protease resistant pool and those that were had 
a Pro following the mono- or dibasic residue. It is known thai subtilisin does not cleave substrates containing 
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Pro at the PT position (Cancr, P., Nilsson, B.. Bumier. J.. Burdick. D. and Wells, J. A. (1989) Proteins: 
Struct.. Funct. t Genet 6:240-248). Thus, di-basic substraies where highly selected and these had the 
additional feature of Leu, Met or Ala at the P4 position. 

Example 5 

5 Cleavage of Substrate Linkers 

We wished to analyze how efficiently the most frequently selected sequences were cleaved in the 
context of a fusion protein. For this we applied an alkaline phosphate- fusion protein assay (Matthews. D. 
J., Goodman, L. l t Gorman, C. M, and Wells, J. A. ( \99A)Protein Science 3: 11 97- 1205 and Matthews, D. 
J. and Wells, J. A. (1993)Sc*enee 260:1 1 13-1 1 17). The hGH substrate linker domains were excised from the 

10 phage vector by PCR and fused in front of the gene for £. colt AP. The fusion protein was expressed and 
purified on an hGH receptor affinity column. The fusion protein was bound to the hGH receptor on a plate 
and treated with the subtilisin variant. The rate of cleavage of the fusion protein from the plate was monitored 
by collecting soluble fractions as a function of time and assaying for AP activity (Figure 5). The most 
frequently isolated substrate sequence, Asn-Leu-Met-Arg-Lys (SEQ ID NO: 35) was cleaved about ten times 

15 faster than the next most frequently isolated clones (Thr-Ala-Ser-Arg-Arg (SEQ ID NO: 36) and Asn-Leu- 
Thr-Ala-Arg (SEQ ID NO: 34). The cleaved AP products were also recovered and subjected to N -terminal 
sequencing to determine the sites of cleavage (Figure 5), cleavage site denoted by 1). In all three fusion 
proteins, this she was immediately following the dibasic or monobasic site according to the mutant subtilisin 
design. We also tested the dibasic sequence isolated from the resistant pool, namely Arg*Lys*Pro-Thr-His 

20 (SEQ ID NO: 42). We observed no detectable cleavage above background for this substrate during the 
assay. 



The present invention has of necessity been discussed herein by reference to certain specific methods 
and materials. It is to be understood that the discussion of these specific methods and materials in no way 
25 constitutes any limitation on the scope of the present invention, which extends to any and all alternative 
materials and methods suitable for accomplishing the ends of the present invention. 



All references cited herein are expressly incorporated by reference. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Genentech, Inc. 

(ii) TITLE OF INVENTION: SUBTILISIN VARIANTS CAPABLE* OF CLEAVING 
5 SUBSTRATES CONTAINING BASIC RESIDUES 

(iii) NUMBER OF SEQUENCES: 89 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Genentech, Inc. 

(B) STREET: 4 60 Point San Bruno Blvd 
!0 (C) CITY: South San Francisco 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94080 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM. TYPE: 3.5 inch, 1.44 Mb floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: WinPatin (Genentech) 



(vi) CURRENT APPLICATION DATA: 
20 (A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: OB/398028 
25 (B) FILING DATE: 03-MAR-1995 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Kubinec, Jeffrey S. 

(B) REGISTRATION NUMBER: 36,575 

(C) REFERENCE/ DOCKET NUMBER: P0936P1PCT 

30 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415/225-8228 

(B) TELEFAX: 415/952-9881 

(C) TELEX: 910/371-7168 



(2) INFORMATION FOR SEQ ID NO:l: 

35 U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8119 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 



40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



GAATTCNGGT CTACTAAAAT ATTATTCCAT ACTATACAAT TAATACACAG 50 

AATAATCTGT CTATTGGTTA TTCTGCAAAT GAAAAAAAGG AG AG GAT AAA 100 

GA GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 138 
Val Arg Gly Lys Lys Val Trp He Ser Leu Leu Phe 
45 -107 -105 -100 

GCT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 177 
Ala Leu Ala Leu He Phe Thr Met Ala Phe Gly Ser Thr 
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-95 -90 -B5 

TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 216 
Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Glu Lys 
-80 -75 -70 

5 AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 255 
Lys Tyr lie Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 

AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 294 
Ser Ala Ala Lys Lys Lys Asp Val lie Ser Glu Lys Gly 
10 -55 -50 -45 

GGG AAA GTG CAA AAG CAA TTC AAA, TAT GTA GAC GCA GCT 333 
Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 
-40 -35 

TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 372 
15 Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 411 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
-15 -10 -5 

20 GCA CAT GCG TAC GCG CAG TCC GTG CCT TAC GGC GTA TCA 4 50 
Ala His Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser 
1 5 

CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAC ACT 48 9 
Gin lie Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
25 10 15 20 

GGA TCA AAT GTT AAA GTA GCG GTT ATC GAC AGC GGT ATC 528 
Gly Ser Asn Val Lys Val Ala Val He Asp Ser Gly He 
25 30 35 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 567 
30 Asp Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 

40 45 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 606 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

35 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 64 5 
Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 70 

CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 684 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
40 75 80 65 

GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 723 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 100 

TCC GGC CAA TAC AGC TGG ATC ATT AAC GGA ATC GAG TGG 7 62 
45 Ser Gly Gin Tyr Ser Trp He He Asn Gly lie Glu Trp 

105 110 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 801 
Ala He Ala Asn Asn Met Asp Val He Asn Met Ser Leu 
115 120 125 
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GGC GGA CCT TCT GGT TCT GCT GCT TTA AAA GCG GCA GTT 840 
Glv Glv Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala Val 
y 130 * 135 

GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA B7 9 
5 Asp Lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala . 
140 150 

GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 918 
Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
y 155 / 160 165 

10 GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 957 
Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
170 175 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 996 
Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 
15 180 165 190 

GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 1035 
Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 

ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 1074 
20 He Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 
205 210 215 

AAC GGT ACC TCA ATG GCA TCT CCG CAC GTT GCC GGA GCG 1113 
Asn Gly Thr Ser Met Ala Ser Pro His Val Ala Gly Ala 
220 225 230 

25 GCT GCT TTG ATT CTT TCT AAG CAC CCG AAC TGG ACA AAC 1152 
Ala Ala Leu He Leu Ser Lys His Pro Asn Trp Thr Asn 
235 240 

ACT CAA GTC CGC AGC AGT TTA GAA AAC ACC ACT ACA AAA 1191 
Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr Lys 
30 245 250 255 

CTT GGT GAT TCT TTC TAC TAT GGA AAA GGG CTG ATC AAC 1230 
Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He Asn 
260 265 

GTA CAG GCG GCA GCT CAG TA AAACATAAAA AACCGGCCTT 1270 
35 Val Gin Ala Ala Ala Gin 
270 275 



40 



45 



GGCCCCGCCG 


GTTTTTTATT 


ATTTTTCTTC 


CTCCGCATGT 


TCAATCCGCT 


1320 


CCATAATCGA 


CGGATGGCTC 


CCTCTGAAAA 


TTTTAACGAG 


AAACGGCGGG 


1370 


TTGACCCGGC 


TCAGTCCCGT 


AACGGCCAAG 


TCCTGAAACG 


TCTCAATCGC 


1420 


CGCTTCCCGG 


TTTCCGGTCA 


GCTCAATGCC 


GTAACGGTCG 


GCGGCGTTTT 


1470 


CCTGATACCG 


GGAGACGGCA 


TTCGTAATCG 


GATCCGGAAA 


TTGTAAACGT 


1520 


TAATATTTTG 


TTAAAATTCG 


CGTTAAATTT 


TTGTTAAATC 


AGCTCATTTT 


1570 


TTAACCAATA 


GGCCGAAATC 


GGCAAAATCC 


CTTATAAATC 


AAAAGAATAG 


1620 


ACCGAGATAG 


GGTTGAGTGT 


TGTTCCAGTT 


TGGAACAAGA 


GTCCACTATT 


1670 


AAAGAACGTG 


GACTCCAACG 


TCAAAGGGCG 


AAAAACCGTC 


TATCAGGGCT 


1720 
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ATGGCCCACT ACGTGAACCA ".TCACCCTAAT CAAGTTTTTT GGGGTCGAGG 1170 
TGCCGTAAAG CACTAAATCG GAACCCTAAA GGGAGCCCCC GATTTAGAGC 1820 
TTGACGGGGA AAGCCGGCGA ACGTGGCGAG AAAGGAAGGG AAGAAAGCGA 1870 
AAGGAGCGGG CGCTAGGGCG CTGGCAAGTG TAGCGGTCAC GCTGCGCGTA 1920 
5 ACCACCACAC CCGCCGCGCT TAATGCGCCG CTACAGGGCG CGTCCGGATC 1970 
NGATCCGACG CGAGGCTGGA TGGCCTTCCC CATTATGATT CTTCTCGCTT 2020 
CCGGCGGCAT CGGGATGCCC GCGTTGCAGG CCATGCTGTC CAGGCAGGTA 2C70 
GATGACGACC ATCAGGGACA GCTTCAAGGA TCGCTCGCGG CTCTTACCAG 2120 
CCTAACTTCG ATCACTGGAC CGCTGATCGT CACGGCGATT TATGCCGCCT 2170 
10 CGGCGAGCAC ATGGAACGGG TTGGCATGGA TTGTAGGCGC CGCCCTATAC 2220 
CTTGTCTGCC TCCCCGCGTT GCGTCGCGGT GCATGGAGCC GGGCCACCTC 2270 
GACCTGAATG GAAGCCGGCG GCACCTCGCT AACGGATTCA CCACTCCAAG 2320 
AATTGGAGCC AATCAATTCT TGCGGAGAAC TGTGAATGCG CAAACCAACC 2370 
CTTGGCAGAA CATATCCATC GCGTCCGCCA TCTCCAGCAG CCGCACGCGG 24 20 
15 CGCATCTCGG GCCGCGTTGC TGGCGTTTTT CCATAGGCTC CGCCCCCCTG 247 0 
ACGAGCATCA CAAAAATCGA CGCTCAAGTC AGAGGTGGCG AAACCCGACA 2520 
GGACTATAAA GATACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC 2570 
TCCTGTTCCG ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCT7 2620 
CGGGAAGCGT GGCGCTTTCT CAATGCTCAC GCTGTAGGTA TCTCAGTTCG 2670 
2 0 GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT GTGCACGAAC CCCCCGTTCA 2720 
GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG TCCAACCCGG 2770 
TAAGACACGA CTTATCGCCA CTGGCAGCAG CCACTGGTAA CAGGATTAGC 2820 
AGAGCGAGGT ATGTAGGCGG TGCTACAGAG TTCTTGAAGT GGTGGCCTAA 2870 
CTACGGCTAC ACTAGAAGGA CAGTATTTGG TATCTGCGCT CTGCTGAAGC 2 920 
25 CAGTTACCTT CGGAAAAAGA GTTGGTAGCT CTTGATCCGG CAAACAAACC 2 970 
ACCGCTGGTA GCGGTGGTTT TTTTGTTTGC AAGCAGCAGA TTACGCGCAG 3020 
AAAAAAAGGA TCTCAAGAAG ATCCTTTGAT CTTTTCTACG GGGTCTGACG 3070 
CTCAGTGGAA CGAAAACTCA CGTTAAGGGA TTTTGGTCAT GAGATTATCA 3120 
AAAAGGATCT TCACCTAGAT CCTTTTAAAT TAAAAATGAA GTTTTAAATC 3170 
30 AATCTAAAGT ATATATGAGT AAACTTGGTC TGACAGTTAC CAATGCTTAA 3220 
TCAGTGAGGC ACCTATCTCA GCGATCTGTC TATTTCGTTC ATCCATAGTT 3270 
GCCTGACTCC CCGTCGTGTA GATAACTACG ATACGGGAGG GCTTACCATC 3320 
TGGCCCCAGT GCTGCAATGA TACCGCGAGA CCCACGCTCA CCGGCTCCAG 3370 
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ATTTATCAGC AATAAACCAG CCAGCCGGAA GGGCCGAGCG CAGAAGTGGT 3420 
CCTGCAACTT TATCCGCCTC CATCCAGTCT ATTAATTGTT GCCGGGAAGC 34 70 
TAGAGTAAGT AGTTCGCCAG TTAATAGTTT GCGCAACGTT GTTGCCATTG 3520 
CTGCAGGCAT CGTGGTGTCA CGCTCGTCGT TTGGTATGGC TTCATTCAGC 3570 
5 TCCGGTTCCC AACGATCAAG GCGAGTTACA TGATCCCCCA TGTTGTGCAA 3620 
AAAAGCGGTT AGCTCCTTCG GTCCTCCGAT CGTTGTCAGA AGTAAGTTGG 3670 
CCGCAGTGTT ATCACTCATG GTTATGGCAG CACTGCATAA TTCTCTTACT 3720 
GTCATGCCAT CCGTAAGATG CTTTTCTGTG ACTGGTGAGT ACTCAACCAA 3770 
GTCATTCTGA GAATAGTGTA TGCGGCGACC GAGTTGCTCT TGCCCGGCGT 3820 
10 CAACACGGGA TAATACCGCG CCACATAGCA GAACTTTAAA AGTGCTCATC 3B70 
ATTGGAAAAC GTTCTTCGGG GCGAAAACTC TCAAGGATCT TACCGCTGTT 3920 
GAGATCCAGT TCGATGTAAC CCACTCGTGC ACCCAACTGA TCTTCAGCAT 3970 
CTTTTACTTT CACCAGCGTT TCTGGGTGAG CAAAAACAGG AAGGCAAAAT 4020 
GCCGCAAAAA AGGGAATAAG GGCGACACGG AAATGTTGAA TACTCATACT 4 070 
15 CTTCCTTTTT CAATATTATT GAAGCATTTA TCAGGGTTAT TGTCTCATGA 4120 
GCGGATACAT ATTTGAATGT ATTTAGAAAA ATAAACAAAT AGGGGTTCCG 4170 
CGCACATTTC CCCGAAAAGT GCCACCTGAC GTCTAAGAAA CCATTATTAT 4220 
CATGACATTA ACCTATAAAA ATAGGCGTAT CACGAGGCCC TTTCGTCTTC 4270 
AAGAATTAAT TCCTTAAGGA ACGTACAGAC GGCTTAAAAG CCTTTAAAAA 4 320 
20 CGTTTTTAAG GGGTTTGTAG ACAAGGTAAA GGATAAAACA GCACAATTCC 4370 
AAGAAAAACA CGATTTAGAA CCTAAAAAGA ACGAATTTGA ACTAACTCAT 4 4 20 
. AACCGAGAGG TAAAAAAAGA ACGAAGTCGA GATCAGGGAA TGAGTTTATA 4 470 
AAATAAAAAA AGCACCTGAA AAGGTGTCTT TTTTTGATGG TTTTGAACTT 4520 
GTTCTTTCTT ATCTTGATAC ATATAGAAAT AACGTCATTT TTATTTTAGT 4 570 
25 TGCTGAAAGG TGCGTTGAAG TGTTGGTATG TATGTGTTTT AAAGTATTGA 4 620 
AAACCCTTAA AATTGGTTGC ACAGAAAAAC CCCATCTGTT AAAGTTATAA 4670 
GTGACTAAAC AAATAACTAA ATAGATGGGG GTTTCTTTTA ATATTATGTG 4720 
TCCTAATAGT AGCATTTATT CAGATGAAAA ATCAAGGGTT TTAGTGGACA 4770 
AGACAAAAAG TGGAAAAGTG AGACCATGGA GAGAAAAGAA AATCGCTAAT 4820 
30 GTTGATTACT TTGAACTTCT GCATATTCTT GAATTTAAAA AGGCTGAAAG 4870 
AGTAAAAGAT TGTGCTGAAA TATTAGAGTA TAAACAAAAT CGTGAAACAG 4 920 
GCGAAAGAAA GTTGTATCGA GTGTGGTTTT GTAAATCCAG GCTTTGTCCA 4 970 
ATGTGCAACT GGAGGAGAGC AATGAAACAT GGCATTCAGT CACAAAAGGT 5020 
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TGTTGCTGAA GTTATTAAAC AAAAGCCAAC AGTTCGTTGG TTGTTTCTCA 5070 
CATTAACAGT TAAAAATGTT TATGATGGCG AAGAATTAAA TAAGAGTTTG 5120 
TCAGATATGG CTCAAGGATT TCGCCGAATG ATGCAATATA AAAAAATTAA 5170 
TAAAAATCTT GTTGGTTTTA TGCGTGCAAC GGAAGTGACA ATAAATAATA 5220 
5 AAGATAATTC TTATAATCAG CACATGCATG TATTGGTATG TGTGGAACCA 5270 
ACTTATTTTA AGAATACAGA AAACTACGTG AATCAAAAAC AATGGATTCA 5320 
ATTTTGGAAA AAGGCAATGA AATTAGACTA TGATCCAAAT GTAAAAGTTC 5370 
AAATGATTCG ACCGAAAAAT AAATATAAAT CGGATATACA ATCGGCAATT 5420 
GACGAAACTG CAAAATATCC TGTAAAGGAT ACGGATTTTA TGACCGATGA 5470. 
10 TGAAGAAAAG AATTTGAAAC GTTTGTCTGA TTTGGAGGAA GGTTTACACC 5520 
GTAAAAGGTT AATCTCCTAT GGTGGTTTGT TAAAAGAAAT ACATAAAAAA 5570. 
TTAAACCTTCf ATGACACAGA AGAAGGCGAT TTGATTCATA CAGATGATGA 5620 
CGAAAAAGCC GATGAAGATG GATTTTCTAT TATTGCAATG TGGAATTGGG 5670 
AACGGAAAAA TTATTTTATT AAAGAGTAGT TCAACAAACG GGCCAGTTTG 5720 
15 TTGAAGATTA GATGCTATAA TTGTTATTAA AAGGATTGAA GGATGCTTAG 5770 
GAAGACGAGT TATTAATAGC TGAATAAGAA CGGTGCTCTC CAAATATTCT 5820 
TATTTAGAAA AGCAAATCTA AAATTATCTG AAAAGGGAAT GAGAATAGTG 5870 
AATGGACCAA TAATAATGAC TAGAGAAGAA AGAATGAAGA TTGTTCATGA 5920 
AATTAAGGAA CGAATATTGG ATAAATATGG GGATGATGTT AAGGCTATTG 5970 
20 GTGTTTATGG CTCTCTTGGT CGTCAGACTG ATGGGCCCTA TTCGGATATT 6020 
GAGATGATGT GTGTCATGTC AACAGAGGAA GCAGAGTTCA GCCATGAATG 6070 
GACAACCGGT GAGTGGAAGG TGGAAGTGAA TTTTGATAGC GAAGAGATTC 6120 
TACTAGATTA TGCATCTCAG GTGGAATCAG ATTGGCCGCT TACACATGGT 6170 
CAATTTTTCT CTATTTTGCC GATTTATGAT TCAGGTGGAT ACTTAGAGAA 6220 
25 AGTGTATCAA ACTGCTAAAT CGGTAGAAGC CCAAACGTTC CACGATGCGA 6270 
TTTGTGCCCT TATCGTAGAA GAGCTGTTTG AATATGCAGG CAAATGGCGT 6320 
AATATTCGTG TGCAAGGACC GACAACATTT CTACCATCCT TGACTGTACA 6370 
GGTAGCAATG GCAGGTGCCA TGTTGATTGG TCTGCATCAT CGCATCTGTT 6420 
ATACGACGAG CGCTTCGGTC TTAACTGAAG CAGTTAAGCA ATCAGATCTT 6470 
30 CCTTCAGGTT ATGACCATCT GTGCCAGTTC GTAATGTCTG GTCAACTTTC 6520 
CGACTCTGAG AAACTTCTGG AATCGCTAGA GAATTTCTGG AATGGGATTC 6570 
AGGAGTGGAC AGAACGACAC GGATATATAG TGGATGTGTC AAAACGCATA 6620 
CCATTTTGAA CGATGACCTC TAATAATTGT TAATCATGTT GGTTACGTAT 6670 
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TTATTAACTT CTCCTAGTAT TAGTAATTAT CATGGCTGTC ATGGCGCATT 6720 
AACGGAATAA AGGGTGTGCT TAAATCGGGC CATTTTGCGT AATAAGAAAA 6770 
AGGATTAATT ATGAGCGAAT TGAATTAATA ATAAGGTAAT AG ATT T AC AT 6820 
TAGAAAATGA AAGGGGATTT TATGCGTGAG AATGTTACAG TCTATCCCGG 6870 
5 CAATAGTTAC CCTTATTATC AAGATAAGAA AGAAAAGGAT TTTTCGCTAC 6920 
GCTCAAATCC TTTAAAAAAA CACAAAAGAC CACATTTTTT AATGTGGTCT 6970 
TTATTCTTCA ACTAAAGCAC CCATTAGTTC AACAAACGAA AATTGGATAA 7020 
AGTGGGATAT TTTTAAAATA TATATTTATG TTACAGTAAT ATTGACTTTT 7070 
AAAAAAGGAT TGATTCTAAT GAAGAAAGCA GACAAGTAAG CCTCCTAAAT 7120 
10 TCACTTTAGA TAAAAATTTA GGAGGCATAT CAAATGAACT TTAATAAAAT 7170 
TGATTTAGAC AATTGGAAGA GAAAAGAGAT ATTTAATCAT TATTTGAACC 7220 
AACAAACGAC TTTTAGTATA ACCACAGAAA TT GAT ATT AG TGTTTTATAC 7270 
CGAAACATAA AACAAGAAGG ATATAAATTT TACCCTGCAT TTATTTTCTT 7320 
AGTGACAAGG GTGATAAACT CAAATACAGC TTTTAGAACT GGTTACAATA 7370 
15 GCGACGGAGA GTTAGGTTAT TGGGATAAGT TAGAGCCACT TTATACAATT 7420 
TTTGATGGTG TATCTAAAAC ATTCTCTGGT ATTTGGACTC CTGTAAAGAA 74 70 
TGACTTCAAA GAGTTTTATG ATTTATACCT TTCTGATGTA GAGAAATATA 7520 
ATGGTTCGGG GAAATTGTTT CCCAAAACAC CTATACCTGA AAATGCTTTT 7570 
TCTCTTTCTA TTATTCCATG GACTTCATTT ACTGGGTTTA ACTTAAATAT 7620 
20 CAATAATAAT AGTAATTACC TTCTACCCAT TATTACAGCA GGAAAATTCA 7670 
TTAATAAAGG TAATTCAATA TATTTACCGC TATCTTTACA GGTACATCAT 7720 
TCTGTTTGTG ATGGTTATCA TGCAGGATTG TTTATGAACT CTATTCAGGA 7770 
ATTGTCAGAT AGGCCTAATG ACTGGCTTTT ATAATATGAG ATAATGCCGA 7820 
CTGTACTTTT TACAGTCGGT TTTCTAATGT CACTAACCTG CCCCGTTAGT 7870 
25 TGAAGAAGGT TTTTATATTA CAGCTCCAGA TCCATATCCT TCTTTTTCTG 7 920 
AACCGACTTC TCCTTTTTCG CTTCTTTATT CCAATTGCTT TATTGACGTT 7970 
GAGCCTCGGA ACCCNTATAG TGTGTTATAC TTTACTTGGA AGTGGTTGCC 8020 
GGAAAGAGCG AAAATGCCTC ACATTTGTGC CACCTAAAAA GGAGCGATTT 8070 
ACATATGAGT TATGCAGTTT GTAGAATGCA AAAAGTGAAA TCAGGATCN 8119 
30 (2) INFORMATION FOR SCO ID NO:2: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Val Arg Gly Lys Lys Val Trp "lie Ser Leu Leu Phe Ala Leu Ala 
-107 -105 -100 -95 

Leu lie Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
5 -90 -85 -80 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr He Val Gly Phe Lys 
-75 -70 -65 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val He 
-60 -55 -50 

10 Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
-45 -40 -35 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Ala His 
15 -15 -10 -5 

Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala 
1 5 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
15 20 25 

20 Ala Val He Asp Ser Gly lie Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 . 

Val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 
25 60 65 70 

Ala Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser Ala 
75 80 85 

Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 
90 95 100 

30 Tyr Ser Trp He He Asn Gly He Glu Trp Ala He Ala Asn Asn 
105 110 115 

Met Asp Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 
120 125 130 

Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 
35 135 140 145 

Val Val Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser 
ISO 155 160 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
165 170 . 175 

40 Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 
160 185 190 

Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr 
195 200- 205 
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Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 
210 215 220 

Ser Pro His Val Ala Gly Ala Ala Ala Leu lie Leu Ser Lys His 
225 230 235 

5 Pro Asn Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 
240 245 250 

Thr Thr Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He 
255 m 260 265 

Asn Val Gin Ala Ala Ala Gin 
10 270 275 

(2) INFORMATION FOR SEQ ID NO: 3: ' 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE . DESCRIPTION : SEQ ID NO: 3: 

Ser Leu Gly Gly Pro Ser Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 4: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

25 Ala Ala Ala Gly Asn Glu Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Ser Thr Val Gly Tyr Pro 
1 5 6 

35 <2> INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Ser Trp Gly Pro Ala Asp Asp 
15 7 

(2) INFORMATION FOR SEQ ID NO:7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid * 
(D) TOPOLOGY: Linear 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Phe Ala Ser Gly Asn Gly Gly 
1 5 7 

(2) INFORMATION FOR^ SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Cys Asn Tyr Asp Gly Tyr Thr 
15 1 - 5 1 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
{B) TYPE: Amino Acid 
20 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Ser Trp Gly Pro Glu Asp Asp 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 10: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 

30 Trp Ala Ser Gly Asn Gly Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 

Cys Asn Cys Asp Gly Tyr Thr 
1 5 7 

40 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear . 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Tip Ala Ser Gly Asp Gly Gly 
1 5 1 

(2) INFORMATION FOR SEQ ID NO: 13: 

5 (i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: "7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGYj Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

10 Cys Asn Cys Asp Gly Tyr Ala 
1 5 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
15 (B> TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Val lie Asp Ser Gly lie 
1 5 6 

20 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 
(BJ TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Asp Asn Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 6 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16- 

lie Val Asp Asp Gly Leu 
35 1 5 6 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: S amino acids 

(B) TYPE: Amino Acid 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 



Ser Asp Asp Tyr His 
1 5 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 6 amino acids 

(B) TYPE: Amino Acid 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

He Leu Asp Asp Gly He 
1 . 5 6 

(2) INFORMATION FOR SEQ ID NO: 19: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

15 Asn Asp Asn Arg His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

He Met Asp Asp Gly He 
1 5 6 

25 (2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Trp Phe Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 22: 

li) SEQUENCE CHARACTERISTICS : 
35 (A) LENGTH: 21 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
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40 GCGGTTATCG ACGACGGTAT CGATTCT 27 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



5 GCGGTTATCG ACAAAGGTAT CGATTCT 2*7 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: Nucleic Acid 
10 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



GCGGTTATCG ACGAAGGTAT CGATTCT 27 
(2) INFORMATION FOR SEQ ID NO: 25: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



CCAAGACAAC GACTCTCACG GAA 23 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 23 base pairs 
25 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



CCAAGACAAC AGCTCTCACG GAA 23 
30 (2) INFORMATION FOR SEQ ID NO: 21 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 ID) TOPOLOGY: Linear- 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 



CCAAGACAAC AAATCTCACG GAA 23 
(2) INFORMATION FOR SEQ ID NO: 28: 



(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 42 base pairs 
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(BJ TYPE: Nucleic Acid 

(CJ ST HANDEDNESS : Single 

(D) TOPOLOGY: Linear • 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



CACTTCCGGC AGCTCGTCGA CAGTGGACTA CCCTGGCAAA TA 42 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



CACTTCCGGC AGCTCGTCGA CAGTGGAGTA CCCTGGCAAA TA 42 
(2) INFORMATION FOR SEQ ID NO: 30: 

IS (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
(DJ TOPOLOGY: Linear 

20 txi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



TTAACATGAG CCTCGGCCCA GCTAGCGGTT CTGCTGCTTT A 41 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE. CHARACTERISTICS: 
(A) LENGTH: 43 base pairs 
25 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
(DJ TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



TTAACATGAG CCTCGGCCCC GCGGATGATT CTGCTGCTTT AAA 4 3 
30 (2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 ( D ) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



CGGCAGCTCA AGCAACGATG GCTATCCTGG CAAATACCCT TCTGTCA 4 7 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 4 4 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear ' 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



S ACTTCCGGCA GCTCTTCGAA QTACGACGGG TACCCTGGCA AATA 44 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
10 ID) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Asn Leu Thr Ala Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 35: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 
IB) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

20 Asn Leu Met Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 
(A). LENGTH: 5 amino acids 
25 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Thr Ala Ser Arg Arg 
1 5 

30 (2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Leu Thr Arg Arg Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3B: 
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Ala Leu Ser Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY : Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Leu Met Leu Arg Lys 
10 1 5 

(2) INFORMATION FOR SEQ ID NO: 40: ' 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Ala Ser Thr His Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO: 41: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

25 Gin Lys Pro Asn Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 

Arg Lys Pro Thr His 
1 5 

35 (2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 

lie Gin Gin Gin Tyr 
1 5 

(2) INFORMATION FOR SEQ ID NO: 44: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid ' 
(D) TOPOLOGY: Linear 

S (xii SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

. Arg Pro Gly Ala Met 
1 5 

(2) INFORMATION FOR, SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

Gin Gly Glu Leu Pro 
15 1 5 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
2 0 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Ala Pro Asp Pro Thr 
1 5 

(2) INFORMATION FOR SEQ ID NO: 47: 

25 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

30 Gin Leu Leu Glu His 
15 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Val Asn Asn Asn His 

1 5 L 

40 (2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Ala Gin Ser Asn Leu 
1 5 

(2) INFORMATION FOR SEQ ID NO: 50: 

5 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY; Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:50: 

10 Thr Ala Ser Arg Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
15 (B) TYPE: Amino Acid 

(D) * TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 

His His His His His His 
1 5 6 

20 (2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 

Leu Met Arg Lys 
1 4 

£2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH : 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Leu Thr Ala Arg 
35 1 4 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

Gly Pro Gly Gly 
1 4 
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{2) INFORMATION FOR SEQ -ID NO:5S: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 

Gly Leu Met Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 56: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids - 

(B) TYPE: Amino Acid 
<D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

15 Ala Ala Pro Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 57; 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 

Gly Pro Gly Gly Xaa Xaa Xaa Xaa Xaa Gly Gly Pro Gly 
1 5 . 10 13 

25 (2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5B: 

Ala Ala Pro Lys 
1 4 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 

Ala Ala Pro Arg 
40 1 4 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
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(D) TOPOLOGY: Linear 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 60: 

Ala Ala Pro Met 
1 4 

5 (2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

10 (xi) SEQUENCE DESCRIPTION: SEQ IDNO:61: 

Ala Ala Pro Gin 
1 < 

(2) INFORMATION FOR SEQ ID NO: 62: 

ti) SEQUENCE CHARACTERISTICS: 
!S (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Ala Ala Lys Phe 
20 1 4 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
25 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

Ala Ala Ala Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 64: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
<D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

3S Ala Ala Arg Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
40 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65 

Ala Ala Asp Phe 
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(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
5 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

Ala Ala Lys Lys 
1 4 

10 (2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: A amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

Ala Ala Lys Arg 
1 A 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: A amino acids 

(BJ TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

Ala Ala Lys Phe 
25 1 A 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
<B) TYPE: Amino Acid 
30 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

Ala Ala Pro Xaa 
1 4 

(2) INFORMATION FOR SEQ ID NO: 70: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 

40 Ala Ala Xaa Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 

5 Ala Ala Xaa Xaa Xaa 
1 5 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 275 amino acids 
10 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 

Ala Gin Ser Val Pro Tyr Gly Val Ser Gin lie Lys Ala Pro Ala 
I 5 10 15 

15 Leu Kis Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val Ala Val 

20 25 30 

He Asp Ser Gly lie Asp Ser Ser His Pro Asp Leu Lys Val Ala 
35 40 45 

Glv Glv Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp 
20 50 55 60 

Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala Leu 
65 70 75 

Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser Ala Ser Leu 
80 B5 90 

25 Tvr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin Tyr Ser 
7 95 100 105 

Trp He He Asn Gly He Glu Trp Ala He Ala Asn Asn Met Asp 
K HO 115 120 

Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala Ala Leu 
30 125 130 135 

Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val Val Val 
140 145 150 

Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

35 Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly Ala Val 

170 175 180 

Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly Pro Glu 
185 190 195 

Leu Asp Val Met Ala Pro Gly Val Ser He Glr. Ser Thr Leu Pro 
40 200 205 210 

Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala Ser Pro 
215 220 225 

His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His Pro Asn 
230 235 240 
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Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr 
* 245 250 255 

Lvs Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu lie Asn Val 
260 265 270 

5 Gin Ala Ala Ala Gin 

275 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
10 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 

Arg Val Arg Arg 
1 < 

15 (2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1146 base pairs 
(BJ TYPE: Nucleic Acid 
(C) STRANDEDNESS: Single 
20 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 36 
Val Arg Gly Lys Lys Val Trp lie Ser Leu Leu Phe 
-107 -105 -100 

25 GCT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 75 
Ala Leu Ala Leu He Phe Thr Met Ale Phe Gly Ser Thr 
-95 -90 -85 

TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 114 
Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Glu Lys 
30 -80 -75 -70 

AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 153 
Lys Tyr He Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 

AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 192 
35 Ser Ala Ala Lys Lys Lys Asp Val He Ser Glu Lys Gly 
-55 -50 -45 

GGG AAA GTG CAA AAG CAA TTC AAA TAT GTA GAC GCA GCT 231 
Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 
-40 -35 

40 TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 270 
Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 309 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
45 -15 -10 -5 

AGA CAT AAG CGC GCG CAG TCC GTG CCT TAC GGC GTA TCA 34 8 
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Arc His Lys Arg Ala Gin Ser Val Pro Tyr Gly Val Ser 
1 5 

CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAC ACT 367 
Gin He Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
j 10 15 20 

GGA TCA AAT GTT AAA GTA GCG GTT ATC GAC AGC GGT ATC 426 
Gly Ser Asn Val Lys Val Ala Val He Asp Ser Gly He 
25 30 35 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 4 65 
) Asp Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 

40 45 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 504 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

5 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 543 
Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 

.„ 65 *>0 

CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 582 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
0 75 60 65 

GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 621 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 100 

TCC GGC CAA GAT AGC TGG ATC ATT AAC GGA ATC GAG TGG 660 
:5 Ser Gly Gin Asp Ser Trp He He Asn Gly He Glu Trp 

105 .110 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 699 
Ala He Ala Asn Asn Met Asp Val He Asn Met Ser Leu 
115 120 125 

10 GGC GGA CCT TCT GGT TCT GCT GCT TTA AAA GCG GCA GTT 736 
Gly Gly Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala Val 
130 135 

GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA 777 
Asp Lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala 
35 140 145 150 

GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 616 
Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 855 
40 Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 

170 175 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 894 
Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 
180 185 . 190 

45 GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 933 
Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 

ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 972 
He Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 
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205 



210 215 



AAC GGT ACC TCA ATG GCA TCT CCG CAC GTT GCC GGA GCG 1011 
Asn Gly Thr Ser Met Ala Ser Pro His Val Ala Gly Ala 
220 225 230 

S GCT GCT TTG ATT CTT TCT AAG CAC CCG AAC TGG ACA AAC 1050 
Ala Ala Leu He Leu Ser Lys His Pro Asn Trp Thr Asn 
235 240 

ACT CAA GTC CGC AGC AGT TTA GAA AAC ACC ACT ACA AAA 1089 
Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr Lys 
10 245 250 255 

CTT GGT GAT TCT TTC TAC TAT GGA AAA GGG CTG ATC AAC 1128 
Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He Asn 
260 265 

GTA CAG GCG GCA GCT CAG 114 6 
15 Val Gin Ala Ala Ala Gin 
270 275 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 382 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:75: 

Val Arg Gly Lys Lys Val Trp He Ser Leu Leu Phe Ala Leu Ala 
-107 -105 -100 -95 

25 Leu He Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
-90 -85 -80 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr He Val Gly Phe Lys 
-75 -70 -65 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val He 
30 -60 -55 -50 

Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
-45 -40 -35 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

35 Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Arg His 
-15 -10 -5 

Lys Arg Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala 
15 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
40 15 ■ 20 25 

Ala Val He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 

Val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

45 Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 

-53- 



Wrtar „-- t PCT7US96/02861 
WO 96/27671 

60 65 70 

Ala Leu Asn Asn Ser lie Gly Val Leu Gly Val Ala Pro Ser Ala 
75 80 65 

Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 
5 90 95 100 

Asp Ser Trp lie He Asn Gly He Glu Trp Ala He Ala Asn Asn 
105 110 H5 

Met Asp Val He As'n Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 
120 125 130 

10 Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 
135 140 I 45 

Val Val Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser 
150 155 160 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
15 165 1^0 175 

Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 
1B0 185 190 

Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr 
195 200 205 

20 Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 
210 215 220 

Ser Pro His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His 
225 230 235 

Pro Asn Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 
25 240 245 250 

Thr Thr Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He 
255 260 265 

Asn Val Gin Ala Ala Ala Gin 
270 275 

30 (2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

Asn Arg Met Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: 
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Gly Ser Gly Gin Tyr Ser Trp lie He Asn Gly 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7B: 

Gly Asp He Thr Thr Glu Asp Glu Ala Ala Ser 
10 1 5 10 11 

(2) INFORMATION FOR SEQ ID NO:79r 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
15 {DJ TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 

Gly Glu Val Thr Asp Ala Val Glu Ala Arg Ser 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO: 80: 

20 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
<D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 

25 Pro Phe Met Thr Asp He He Glu Ala Ser Ser 
1 5 10 H 

(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 11 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

Gly He Val Thr Asp Ala He Glu Ala Ser Ser 
1 5 10 11 

35 (2) INFORMATION FOR SEQ ID NO:82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

4° (D) TOPOLOGY: Linear L 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 



GGTTCCGGCC AAGATAGCTG GATCATT 27 
(2) INFORMATION FOR SEQ ID NO:83: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: Nucleic Acid* 

(C) STRANDEDNESS: Single 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 



CCAATACAGC TGGGAAATTA ACGGAATCG 29 

(2) INFORMATION FOR SEQ ID NO:84: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 31 base pairs . 

(B) TYPE: Nucleic Acid 
<C> STRANDEDNESS: Single 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 



15 GGTTCCGGCC AAGATAGCTG GGAAATTAAC G 31 
(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: Nucleic Acid 
20 (C) STRANDEDNESS: Single 

<D> TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85 



AAGAAGATCA CGTAAGACAT AAGCGCGCGC 30 
(2) INFORMATION FOR SEQ ID NO: 86: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86 

30 Arg Ala Lys Arg 
1 4 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87 

Lys Ala Lys Arg 
1 4 

AO (2) INFORMATION FOR SEQ ID NO: 88 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 8 amino acids 
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(B) TYPE: Amino Acid 
(D> TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

Gly Pro Gly Gly Leu Met Arg Lys 
1 5 8 

(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: 

Gly Pro Gly Gly Lys Ala Lys Arg 
1 5 8 
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What is claimed is: 

1. A subtilisin variant derived* from a precursor subtilisin-type serine protease said variant 
capable of cleaving a polypeptide substrate comprising the sequence: 

5 OH 

I I 

P4-P3-P2-P1-C-N-P1' 

wherein; 

P4 is a basic amino acid; 
10 P3 is any amino acid selected from the naturally occurring amino acids; 

P2 is a basic amino acid; 
PI is a basic amino acid; and 
PT is not Pro. 

2. The subtilisin variant of claim I containing an acidic amino acid at a residue equivalent to 
15 Asn 62, Tyr 104 and Gly 166 of the subtilisin naturally produced by Bacillus amyloliqucfacicns. 

3. The subtilisin-type serine protease variant of claim 2 wherein the acidic amino acid is Asp 
or Glu. 

4. The subtilisin-type serine protease variant of claim 3 wherein the acidic amino acid is Asp. 

5. The subtilisin-type serine protease variant of claim 2 wherein the precursor subtilisin-type 
2 0 serine protease in the subtilisin naturally produced by Bacillus amyloUquefaciens. 

6. The subtilisin variant of claim 5 having the amino acid sequence of the mature polypeptide 

of Figure 8 (SEQ ID NO: 75). 

1. A subtilisin variant having substrate specificity for peptide substrates containing dibasic 

amino acid sequences. 

25 g. Th c subtilisin variant of claim 7 having a different amino acid residue at residue position +62 

than subtilisin naturally produced by Bacillus amyloliquefaciens. 

9. The subtilisin variant of Claim 8 having an Asp or Glu at residue position +62. 

10. The subtilisin variant of Claim 9 having an Asp at residue position +62. 

U. The subtilisin variant of Claim 10 further having an Asp or Glu at residue position +166. 
30 \2. The subtilisin variant of Claim 1 1 having an Asp at residue position +166. 

13. The subtilisin variant of Claim 12 having the amino acid sequence of the mature polypeptide 
provided in Fig. 6. 

14. An isolated nucleic acid molecule encoding the subtilisin variant of Claim 1 . 

15. The nucleic acid molecule of Claim 14 further comprising a promoter opcrably linked to the 

3 5 nucleic acid molecule. 

1 6. An expression vector comprising the nucleic acid molecule of Claim 1 5 operably linked to 

control sequences recognized by a host cell transformed with the vector. 

17. A host cell transformed with the vector of Claim 16. 

18. An isolated nucleic acid molecule encoding the subtilisin variant of Claim 7. 
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19. The nucleic acid molecule of Claim 18 further comprising a promoter operably linked to the 
nucleic acid molecule. 

20. An e xpr e s sion vector comprising the nucleic acid molecule of Claim 1 9 operably linked to 
control sequences recognized by a host cell transformed with the vector. 

5 21. A host cell transformed with the vector of Claim 20. 

22. A process of using the nucleic acid molecule encoding the subtil is in variant to effect 
production of the subtilisin variant comprising cuhuring the host cell of Claim 21 under conditions suitable 
for expression of the subtilisin variant 

23. The process of Claim 22 further comprising recovering the subtilisin variant from the host 
10 cell culture medium. 

24. A method of using the subtilisin variant of Claim 1 comprising contacting a fusion protein 
containing a dibasic sequence with the subtilisin variant. 

25. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the formula: 

15 P4-P3-P2-Pl-Pr 
wherein, 

P4 is a basic amino acid; 

P3 is an amino acid selected from the naturally occurring amino acids; 
P2 is a basic amino acid; 
20 PI is a basic amino acid; and 

Pl'is not Pro; 
comprising the step of: 

subjecting said polypeptide to the subtilisn variant of claim 1 in a reaction mature under conditions 
such that the subtilisn variant cleaves the polypeptide. 
25 26. A process of using the nucleic acid molecule encoding the subtilisin variant to effect 

production of the subtilisin variant comprising cuhuring the host cell of Claim 17 under conditions suitable 
for expression of the subtilisin variant 

27. The process of Claim 26 further comprising recovering the subtilisin variant from the host 
cell culture medium. 

3 0 28. A method of using the subtilisin variant of Claim 7 comprising contacting a fusion protein 

containing a dibasic sequence with the subtilisin variant. 

29. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the formula: 
P4-P3-P2-Pl-Pr 
3 5 wherein, 

P4 is a large hydrophobic amino acid; 

P3 is an amino acid selected from the naturally occurring amino acids; 
P2 is a basic amino acid; 
PI is a basic amino acid; and 
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PI* is not Pro; 
comprising the sup of: 

subjecting said polypeptide to the subtilisn variant of claim 7 in a reaction mixture under conditions 
such that the subtilisn variant cleaves the polypeptide. 
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FIGURE 8 
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